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Qm m PHASE RiTT y moN OF DTFFF.RENTI AT 1 Y EXPRESSED GENES 

This is a continuation-in-part of co-pending U.S. patent application Ser. No. 
09/130,446 filed 6 August 1998, which is a continuation-in-part of co-pending U.S. 
patent appHcation Ser. No. 09/005,222 filed 9 January 1998, which appUcations are 
incorporated by reference. 



mPin OF THK INVENTION 

10 The invention relates generally to methods for identifying differentially 

expressed genes, and more particularly, to a method of competitively hybridizing 
differentially expressed DNAs with reference DNA sequences cloned on solid phase 
supports to provide a differential expression library which can be physically 
manipulated, e.g. by fluorescence-activated flow sorting. 

15 

ttArKGROIJND 

The desire to decode the human genome and to understand the genetic basis of 
disease and a host of other physiological states associated differential gene expression 
has been a key driving force in the development of improved methods for analyzing 

20 and sequencing DNA, Adams et al.. Editors, Automated DNA Sequencing and 
Analysis (Academic Press, New York, 1994). The human genome is estimated to 
contain about 10^ genes, 15-30% of which-or about 20-40 megabases-are active in 
any given tissue. Such large numbers of expressed genes make it difficult to track 
changes in expression patterns by available techniques, especially in view of the large 

25 number of genes that are expressed at relative low levels: It has been estimated that 
as much as 30% of mRNA consists of many thousands of distinct species each 
making up less than 0.5% of the total, and typically averaging less than 14 copies per 
cell, Sambrook et al.. Molecular Cloning, Second Edition (Cold Spring Harbor 
Laboratory Press, New York, 1989). Even substantial changes in expression among 

30 such low abundance mRNAs can be difficult to detect in the presence overwhehning 
quantities of abundant sequences. 

A variety of techniques are available for analyzing gene expression that differ 
widely in convenience, expense, and sensitivity. Commonly used low resolution 
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techniques include differential display, indexing, subtraction hybridization, and 
numerous DNA fingerprinting techniques, e.g. Vos et al.. Nucleic Acids Research, 23: 
4407-4414 (1995); Hubank et aL, Nucleic Adds Research, 22: 5640-5648 (1994); 
Lingo et al.. Science, 257: 967-971 (1992); Erlander et al.. International patent 

5 appUcation PCT/US94/13041; McClelland et al., U.S. patent 5,437,975; Unrau et al.. 
Gene, 145: 163-169 (1994); and the like. Higher resolution techniques include 
analysis of expressed sequence tags (ESTs), e.g. Adams et al. (cited above); analysis 
of concatenated fragments of expressed sequences (SAGE), e.g. Velculescu et al.. 
Science, 270: 484-486 (1995); Zhang et al.. Science, 276: 1268-1272 (1997); 

10 Velculescu et al.. Cell, 88: 243-251 (1997); and the use of microairays of 

oligonucleotides or polynucleotides for capturing complemmtary polynucleotides 
from expressed genes, e.g. Schena et aL, Science, 270: 467-469 (1995); DeRisi et al., 
Science, 278: 680-686 (1997); Chee et al.. Science, 274: 610-614 (1996); and the like. 
The latter two high resolution techniques have shown promise as potentially 

15 robust systems for analyzing gene expresjsion; however, there are still technical issues 
that need to be addressed with both approaches. In microarray systems, genes to be 
monitored must be known and isolated beforehand, which means different 
microarrays, or "DNA chips," have to be manufactured for each specialized use and 
for every different type of organism or species examined With respect to microarrays 

20 constructed from fluid-delivered cDNAs, a significant degree of variability, e.g. 2-5 

TM 

fold, exists in the signals generated under the same hybridization conditions. Atlas 
cDNA Expression System Users Manual (Clontech Laboratories, Palo Alto, 1998), 
and the systems are not readily re-usable. With respect to microarrays of synthetic 
oligonucleotides, a significant set-up cost for manufacturing such arrays and 

25 expensive chip-reading instruments put such systems bqrond the financial capability 
of many potential users. In sequence tag systems, although no special instrumentation 
is necessary, as an extensive installed base of DNA sequencers may be used, even 
routine expression analysis requires a significant sequencing effort, e.g. several 
thousand sequencing reactions or more; the selection of type lis tag-generating 

30 enzymes is limited; and the length (nine nucleotides) of the sequence tag in current 
protocols severely limits the number of cDNAs that can be uniquely labeled. It can be 
shown that for organisms expressing large sets of genes, such as mammalian cells, the 
likelihood of nine-nucleotide tags being distinct for all expressed genes is extremely 
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low, e.g. Feller, An Introduction to Probability Theory and Its Applications, Second 
Edition, Vol. I (John Wiley & Sons, New York. 1971). 

It is clear from the above that there is a need for a convenient and sensitive 
technique for analyzing gene expression that permits the analysis of either known or 
S unknown genes from any source. The availability of such a technique would find 
immediate appUcation not only in medical and scientific research, but also in a host of 
applied fields, such as crop and livestock development, pest management, drug 
development, diagnostics, disease management, and the like. 

10 SUMMA RY OF TH E INVENTIQM 

Accordingly, objects of our invention include, but are not limited to, providing 
a method for identifying and isolating differentially expressed genes; providing a 
method of identifying and isolating polynucleotides on the basis of labels that 
generate different optical signals; providing a method for profiling gene expression of 

1 5 large numbers of genes simultaneously; providing a method of identifying and 

separating genes in accordance with whether their expression is increased or decrease 
under any given conditions; providing a method for identifying rare genes; and 
providing a method for massively parallel signature sequencing of large numbers of 
genes isolated according to their expression. 

20 Our invention accomplishes these and other objects by providing differently 

labeled populations of polynucleotides &om cell or tissue sources whose gene 
expression is to be compared. In comparing gene expression, differently labeled 
polynucleotides of a plurality of populations are competitively hybridized with 
reference DNA cloned on solid phase supports. Preferably, the sohd phase supports 

25 are microparticles which, after such competitive hybridization, provide a differmtial 
expression library which may be manipulated by fluorescence-activated cell sorting 
(FACS), or other sorting means responsive to optical signals generated by labeled 
polynucleotides on the microparticles. Monitoring the relative signal intensity of the 
different labels on the microparticles permits quantification of the relative expression 

30 of particular genes in the different populations. 

In one aspect of the invention, populations of microparticles having relative 
signal intensities of interest are isolated by FACS and the attached polynucleotides are 
sequenced to determine the identities of the rare or differentially expressed genes. 
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Preferably, the method of the invention is carried out by the foUowing steps: 
a) providing a reference population of nucleic acid sequences attached to separate 
solid phase supports in clonal subpopulations; b) providing a population of 
polynucleotides of expressed genes fiom each of the pluraUty of different cells or 

5 tissues, the polynucleotides of expressed genes fiom different cells or tissues having a 
different Ught-generating label; c) competitively hybridizing the populations of 
polynucleotides of expressed genes fiom each of the plurality of different ceUs or 
tissues with the reference population to form duplexes between the sequences of the 
reference population and polynucleotides of each of the different cells or tissues such 

10 that the polynucleotides are presmt in duplexes on each of the solid phase supports in 
ratios directly related to the relative expression of flieir corresponding genes in the 
different cells or tissues; and d) detecting a relative optical signal generated by the 
light-generating labels of the duplexes attached tiiereto. In fiir&er preference, the 
method includes the step of sorting each solid phase support according to flie relative 

15 optical signal detected. Preferably, the reference population of nucleic acids is 
derived from genes of the plurality of different cells or tissues being analyzed. As 
used herein, the phrase "polynucleotides of expressed genes" is meant to include any 
RNA produced by transcription, including in particular mRNA, and DNA produced 
by revase transcription of any RNA, including in particular cDNA produced by 

20 reverse transcription of mSNA. 

The present invention overcomes shortcoming in the art by providing 
con^wsitions, methods, and kits for separating and identifying genes that are 
differentially expressed without requiring any previous analysis or knowledge of the 
sequences. The invention also permits differentially regulated genes to be separated 

25 from unregulated gaies for analysis, thereby eliminating the need to analyze large 
numbers of unregulated genes in ordar to obtain infonnation on the genes of interest. 



RttTPF nPSn tTPTTON OF THE DRAWINGS 

Figures la and lb illustrate FACS analysis of microparticles loaded with 
competitivdy hybridized DNA strands labeled with two different fluorescent dyes. 

Figure 2 is a schematic representation of a flow chamber and detection 
ai^aratus for observing a planar array of microparticles loaded with restriction 
fragments for sequencing. 



-4- 

SUBSTITUTE SHEET (RULE 26) 



wo 99/35293 



PCT/US99/00666 



Figure 3a iUustrates a preferred scheme for converting isolated messenger 
RNA (mKNA) into cDNA and insertion of the cDNA into a tag-containing vector. 

Figure 3b illustrates a preferred scheme for ampUfying tag-cDNA conjugates 
out of a vector and loading the ampUfied conjugates onto microparticles. 
5 Figure 3c illustrates a preferred scheme for isolating sorted cDNAs for cloning 

and sequencing. 

Figure 4a and 4b illustrate alternative procedures for cloning differentially 

expressed cDNAs isolated by FACS sorting. 

Figures 5a-e illustrate flow analysis data of microparticles carrying 
10 predetermined ratios of two differently labeled cDNAs. 

Figure 6 illustrates flow analysis data of microparticles carrying differently 
labeled cDNAs ftom stimulated and unstimulated THP-1 cells. 

Figure 7 illustrates flow analysis data of microparticles carrying labeled 
cDNAs derived from mRNA of low abundance in stimulated THP-1 cells. 
15 Figure 8 illustrates flow analysis data of microparticles canying labeled 

cDNAs derived from mRNA of low abundance in human bone marrow. 

Figure 9 illustrates flow analysis data of microparticles carrying differently 
labeled cDNAs from glucose nonnal and glucose starved muscle tissue. 

Figure lOA illustrates an embodiment of the invention for constructing a 
20 reference nucleic add population on microparticles. 

Figure lOB illustrates an embodiment for using the reference library of Figure 
lOA to compare gene expression of two cell populations. 
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Definitions 

"Complement" or **tag complement" as used herein in reference to 
oligonucleotide tags refers to an oligonucleotide to which a oligonucleotide tag 
specifically hybridizes to form a perfectly matched duplex or triplex. In embodiments 
5 where specific hybridization results in a triplex, the oligonucleotide tag may be 
selected to be either double stranded or single stranded. Thus, where triplexes are 
formed, the term '"complement" is meant to encompass either a double stranded 
complement of a single stranded oligonucleotide tag or a single stranded complement 
of a double stranded oligonucleotide tag. 
1 0 The term "oligonucleotide" as used herein includes linear oligomers of natural 

or modified monomers or linkages, including deoxyribonucleosides, ribonucleosides, 
anomeric forms thereof, peptide nucleic acids (PNAs), and the like, capable of 
specifically binding to a target polynucleotide by way of a regular pattern of 
monomer-to-monomer interactions, such as Watson-Crick type of base pairing, base 
1 5 stacking, Hoogsteen or reverse Hoogsteen types of base pairing, or the Uke. Usually 
monomers are linked by phosphodiester bonds or analogs thereof to fonn 
oligonucleotides ranging in size fix>m a few monomeric units, e.g. 3-4, to several tens 
of monomeric units, e.g. 40-60. Whmever an oligonucleotide is represented by a 
sequence of letters, such as "ATGCCTG," it will be understood that the nucleotides 
20 are in 5'->3' order &om left to right and that "A" denotes deoxyadenosine, "C" 
denotes deoxycytidine, "G" denotes deoxyguanosine, and *T" denotes thymidine, 
unless otherwise noted. Usually oligonucleotides of the invention comprise the four 
natural nucleotides; however, they may also comprise non-natural nucleotide analogs. 
It is clear to those skilled m the art when oligonucleotides having natural or non- 
25 natural nucleotides may be employed, e.g. where processing by enzymes is called for, 
usually oligonucleotides consisting of natural nucleotides are required. 

"Perfectly matched" in reference to a duplex means that the poly- or 
oligonucleotide strands making up the duplex foim a double stranded structure with 
one other such that every nucleotide in each strand undergoes Watson-Crick 
30 basepairing with a nucleotide in the other strand. The term also comprehends the 
pairing of nucleoside analogs, such as deoxyinosine, nucleosides with 2-aniinopurine 
bases, and the like, that may be employed. In reference to a triplex, the term means 
that the triplex consists of a perfectly matched duplex and a third strand in which 
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every nucleotide undergoes Hoogsteen or reverse Hoogsteen association with a 
basepair of the perfectly matched duplex. Conversely, a "mismatch" in a duplex 
between a tag and an oligonucleotide means that a pair or triplet of nucleotides in the 
duplex or triplex fails to undergo Watson-Crick and/or Hoogsteen and/or reverse 
S Hoogsteen bonding. 

As used herein, ''nucleoside'* includes the natural nucleosides, including 2'- 
deoxy and 2 -hydroxyl forms, e.g. as described in Romberg and Baker, DNA 
Replication, 2nd Ed. (Freeman, San Francisco, 1992). "Analogs" in reference to 
nucleosides includes synthetic nucleosides having modified base moieties and/or 

10 modified sugar moieties, e.g. described by Schdt, Nucleotide Analogs (John Wiley, 
New Yoik, 1980); Uhhnan and Peyman, Chemical Reviews, 90: 543-584 (1990), or 
the like, with the only proviso that they are capable of specific hybridization. Such 
analogs include synthetic nucleosides designed to enhance binding properties, reduce 
complexity, increase specificity, and the like. 

15 As used herein "'sequence determination" or "'determining a nucleotide 

sequence" in reference to polynucleotides includes determination of partial as well as 
full sequence information of the polynucleotide. That is, the term includes sequence 
comparisons, fingerprinting, and like levels of information about a target 
polynucleotide, as well as the express identification and ordering of nucleosides, 

20 usually each nucleoside, in a target polynticleotide. The term also includes the 

determination of ttie identification, ordering, and locations of one, two, or three of the 
four types of nucleotides within a target polynucleotide. For example, in some 
embodiments sequence determination may be effected by identifying the ordering and 
locations of a single type of nucleotide, e.g. cytosines, within the target 

25 polynucleotide ""CATCGC ..." so that its sequence is represented as a binary code, e.g, 
"lOOlOl ... " for X-(not C)-(not C)-C-(not C)-C ... " and tiie like. 

As used herein, the teim "'complexity" in reference to a population of 
polynucleotides means the number of different species of polynucleotide present in 
the population. 

30 As used herein, the terai "relative gene expression" or "relative expression" in 

reference to a gene refers to the relative abimdance of the same gene expression 
product, usually an mRNA, in different cells or tissue types. 
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DETAILED DESCRIPTTON OF THE TNVENTTON 
The present invention provides compositions, methods, and kits for analyzing 
relative gene expression in a single or plurality of cell and/or tissue types that are of 
interest. The methods of the invention can be applied to polynucleotides derived fiom 
5 animals, plants, and microorganisms such as fimgi, bacteria, mycoplasma, 

cyanobacteria, algae, and the like. Preferably, the polynucleotides are derived from 
animals, plants or microorganisms involved in fermentation process, with vertebrates 
and agricultural plants being most preferred. The plurality usually comprises a pair of 
cell or tissue types, such as a diseased tissue or cell type and a healthy tissue or cell 

10 type, or such as a cell or tissue type being subjected to a stimulus or stress, e.g. a 

change of nutrients, temp^ature, or the like, and the corresponding cell or tissue type 
in an unstressed or unstimulated state. In another embodiment, the plurality can 
comprise a pair of cell or tissue types having homologous genes, such as cells or 
tissue fix)m different organisms. The pluraUty may also include more than two cell or 

1 S tissue types, such as would be required in a comparison of expression patterns of the 
same cell or tissue over time, e,g. liver cells after exposure of an organism to a 
candidate drug, organ cells of a test animal at different developmental states, and the 
like. Preferably, the pluraUty is 2 or 3 cell or tissue types; and more preferably, it is 2 
cell or tissue types. 

20 The method of the invention typically comprises providing a reference 

population of nucleic acid sequences attached to separate solid phase supports in 
clonal subpopulations, providing at least one population of polynucleotides of 
expressed genes, hybridizing the population(s) of polynucleotides of expressed genes 
with the reference nucleic acid population, and detecting, and preferably sorting each 

25 soUd phase support according to a relative optical signal generated by the duplexes 
attached thereto. 

Figure lOA illustrates an embodiment of the invention for constructing a 
reference nucleic acid population on microparticles, and Figure lOB illustrates an 
embodiment for using such a reference library to compare gene expression of two cell 
30 populations. Messenger RNA (mRNA) is extracted (1004) from cell populations 
(1000) and (1002) using conventional protocols to give two populations of 
polynucleotides (1006) and (1008), respectively. The extraction reactions can be 
carried out separately or on a mixture of cell types. Preferably, the reactions are 
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carried out separately so that the relative quantities of mRNA torn the two 
populations can be more readily controlled. Portions of mRNA (1006) and mRNA 
(1008) are combined (1010) and cDNA library (1012) is constructed in vectors 
carrying a rep^toire of oligonucleotide tags, in accordance with the procedure 

5 described in Brenner et al., U.S. patent 5,846,719. Preferably, equal portions of 
mRNA, equal molar quantities, are taken from each population of mRNA. A 
sample of vectors from library (1012) is taken and amplified, e.g. by polymerase 
chain reaction, transfection and cloning, or the like, after which the tag-cDNA 
conjugates (1014) carried by the vectors are excised or copied (101 1) and then 

1 0 isolated. Loaded microparticles are then formed and prepared for use in competitive 
hybridization as follows (1018). The isolated tag-cDNA conjugates (1014), illustrated 
with oligonucleotide tags a, b, c, and d, are specifically hybridized to microparticles 
carrying their tag complements a', b', c', and d* (1016), respectively. The tag-cDNA 
conjugates are ligated to tag complements so that at least one strand of the double 

1 5 stranded tag*cDNA conjugate is covalently attached to the microparticle. 

Microparticles carrying tag-cDNA conjugates are separated fiom those that do not 
carry tag-cDNA conjugates, preferably using a fluorescence-activated cell sorter 
(P ACS), or like instrument. The non-covalently attached strand is melted off and 
separated from the microparticles to yield microparticles (1020) carrying a reference 

20 nucleic acid population. 

As illustrated in Figure 10b, gene expression of cells (1050) may be compared 
to that of cells (1052) by separately extracting (1054) mRNA (1056) and (1058) from 
each cell type. After construction of cDNA libraries (1062) and (1064) using 
conventional protocols, single stranded nucleic acid probes are generated from the 

25 respective cDNA populations (1062) and (1064), the probes preferably being labeled 
with optically distinguishable fluorescent dyes F (1068) and R (1066), e.g., rhodamine 
and fluorescein. Equal amoimts of the labeled polynucleotides are mixed and 
hybridized (1072) to the complementary strands carried by the microparticles to form 
duplexes (1074). After the hybridization is complete, microparticles carrying the 

30 duplexes thereby formed (1074) can be sorted (1076) in accordance to predetermined 
criteria, such as fluorescence ratio, fluorescence intensity, and/or the like. In such a 
manner, subpopulations of interest can be isolated and fiirther analyzed, e.g., those 
corresponding to up-regulated or down-regulated genes. 
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For analysis in accordance with the invention, messenger KNA (mRNA) is 
extracted from the cells or tissues of interest using conventional protocols, as 
disclosed in, for example, Sambrook et al.. Molecular Cloning: A Laboratory Manual, 
2nd Edition (Cold Spring Harbor Laboratory, New York). Preferably, the populations 
5 of mRNAs to be compared are converted into populations of labeled cDNAs by 
reverse transcription in the presence of a labeled nucleoside triphosphate using 
conventional protocols, e.g. Schena et aL, Science 270: 467-470 (1995); DeRisi et al.. 
Science 278: 680-686 (1997); or the like, prior to hybridization to a reference DNA 
population. 

10 An important feature of the invention is that the genes whose expression levels 

change or are different than those of the other cells or tissues being examined may be 
analyzed separately from those that are not regulated or otherwise altered in response 
to whatever stress or condition is being studied. As described below, in the preferred 
embodiment gene products from the cells or tissues of interest are competitively 

1 5 hybridized with a reference population consisting of DNA sequences attached in 

clonal subpopulations to sq)arate microparticles. As a result, microparticles carrying 
labeled gene products in ratios indicating differential expression may be manipulated 
and analyzed separately from those carrying labeled gene products in ratios indicating 
no change in expression, e.g. "house-keeping" genes, genes encoding structural 

20 proteins, or the like. 

Another important feature of ttie invention is that the identity of the nucleic 
acid being analyzed, e.g.. genomic DNA or gene products such as cDNA, mRNA, 
RNA transcript, or the like, need not be known prior to analysis. After relative 
expression is determined, cDNAs derived from expressed genes may be identified by 

25 direct sequencing on the solid phase support, preferably a microparticle, using a 
number of different sequencing approaches. For identification, only a portion of the 
cDNAs need be sequenced. In many cases, the portion may be as small as nine or ten 
nucleotides, e.g. Velculescu et al. (cited above). Preferably, entire subpopulations of 
differentially expressed genes are sequenced simultaneously using MPSS, or a similar 

30 parallel analysis technique. In a preferred embodiment, this is conveniently 

accomplished by providing a reference population of DNA sequences such that each 
such sequence is attached to a separate microparticle in a clonal subpopulation. As 
used herein, the phrase "clonal subpopulation" refers to multiple copies of a single 
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kind of polynucleotide selected from a population of interest^ such as a cDNA library 
constructed fiom mRNA extracted from a cell or tissue whose gene expression is 
being analyzed. Such clonal subpopulations may be formed in a number of ways, 
including by separate amplification of a poynucleotide and attacment by conventional 

S attachment chemistries, ^.g^., Hermansen, Bioconjugate Techniques (Academic Press, 
New York, 1996). As explained more fully below, clonal subpopulations are 
preferably formed by so-called "solid phase cloning" disclosed in Breimer, U.S. 
patent 5,604,097 and Brenner et al., U.S. patent 5,846,719, which are incorporated 
herein by reference. Briefly, such clonal subpopulations are formed by hybridizing an 

10 amplified sample of tag-DNA conjugates onto one or more solid phase support(s), 
e.g., separate and unconnected microparticles, so that individual microparticles, or 
different regions of a larger support, have attached multiple copies of the same DNA. 

The DNA component of the tag-DNA conjugate can be cDNA, genomic 
DNA, a firagment of cDNA or genomic DNA, or a synthetic DNA, such as, for 

15 exanq)le, an oligonucleotide. Preferably the tag-DNA conjugate is a cDNA or a 

fragment of genomic DNA CgDNA"). The number of copies of a cDNA or gDNA in 
a clonal subpopulation may vary widely in different embodiments depending on 
several factors, including the density of tag complements on the solid phase supports, 
the size and composition of microparticle used, the duration of hybridization reaction, 

20 the complexity of the tag repertoire, the concentration of individual tags, the tag-DNA 
sample size, the labeling means for generating optical signals, the particle sorting 
means, signal detection system, and the like. 

Guidance for making design choices relating to these factors is readily 
available in the literature on flow cytometry, fluorescence microscopy, molecular 

25 biology, hybridization technology, and related disciplines, as represented by the 

references cited herein. Preferably, the number of copies of a cDNA or a gDNA in a 
clonal subpopulation is sufiicient to permit FACS detection and/or sorting of 
nucroparticles, wherein fluorescent signals are generated by one or more fluorescent 
dye molecules carried by the cDNAs attached to the microparticles. Typically, this 

30 number can be as low as a few thousand, e.g. 3,000-5,000, when a fluorescent 

molecule such as fluorescein is used, and as low as several hundred, e.g. 800-8000, 
when a rfaodamine dye, such as rhodamine 6G, is used. More preferably, when 
loaded microparticles are detected and/or sorted by FACS or like instruments, clonal 
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subpopulations consist of at least 10"* copies of a cDNA or gDNA; and most 
preferably, in such embodiments, clonal subpopulations ponsist of at least 10^ copies 
ofacDNAorgDNA. 

Labeled cDNAs or RNAs from the cells or tissues to be compared are 

5 competitively hybridized to the DNA sequences of the reference DNA population 
using conventional hybridization conditions, eg. such as disclosed in Schena et al. 
(cited above); DeRisi et al. (cited above); or Shalon, Ph.D. Thesis entitled *T)NA 
Microarrays,*' Stanford University (1995). After hybridization, an optical signal is 
generated by each of the two labeled species of cDNAs or RNAs so that a relative 

10 optical signal is determined for each microparticle. Preferably, such optical signals 
are generated and measured in a fluorescence activated cell sorter, or like instrument, 
which permits the microparticles to be sorted and accumulated whose relative optical 
signal fall with a predetennined range of values. The microparticles loaded with 
cDNAs or RNAs generating relative optical signals in the desired range may be 

1 5 isolated and identified by sequencing, such as with MPSS, as described more fiiUy 
below. 

Preferably, clonal subpopulations of cDNAs or other DNA molecules derived 
fiom RNA are attached to microparticles using the processes illustrated in Figures 3a 
and3b. First, as illustrated in Figure 3a, ndlNA (300) is extracted fit)m a cell or 

20 tissue source of interest using conventional techniques and is converted into cDNA 
(309) with ends appropriate for inserting into vector (316). Preferably, primer (302) 
having a 5' biotin (305) and poly(dT) region (306) is annealed to mRNA strands (300) 
so that the first strand of cDNA (309) is synthesized with a reverse transcriptase in the 
presence of the four deoxyribonucleoside triphosphates. Preferably, 5- 

25 methyldeoxycytidine triphosphate is used in place of deoxycytosine triphosphate in 
the first strand synthesis, so that cDNA (309) is hemi-methylated, except for the 
region corresponding to primer (302). This allows primer (302) to contain a non- 
methylated restriction site for releasing the cDNA horn a support. The use of biotin 
in primer (302) is not critical to the invention and other molecular capture techniques, 

30 or moieties, can be used, e.g. triplex capture, or the like. Region (303) of primer 
(302) preferably contains a sequence of nucleotides that results in the formation of 
restriction site Vi (304) upon synthesis of flie second strand of cDNA (309). After 
isolation by binding the biotinylated cDNAs to streptavidin supports, e.g. Dynabeads 
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M-280 (Dynal, Oslo, Norway), or the like, cDNA (309) is preferably cleaved with a 
restriction endonuclease which is insensitive to hemimethylation (of the Cs) and 
which recognizes site ri (307), Preferably, ri is a four-base recognition site, 
corresponding to Dpn U, or like enzyme, which ensures that substantially all of the 

5 cDNAs are cleaved and that the same defined end is produced in all of the cDNAs. 
After washing, the cDNAs are then cleaved with a restriction endonuclease 
recognizing ti, releasing fragment (308) which is purified using standard techniques, 
e.g. ethanol precipitation, polyacrylamide gel electrophoresis, or the like. After 
resuspending in an appropriate buffer, fragment (308) is directionally ligated into 

10 vector (316), which carries tag (310) and a cloning site with ends (312) and (314). 
Preferably, vector (316) is prepared with a ^'stuffer'* fi:agment in the cloning site to aid 
in the isolation of a fully cleaved vector for cloning. 

Preparation of the tag-cDNA conjugates is not limited to the method described 
above and can readily be achieved in a variety of ways using conventional molecular 

IS biology techniques. For example, cDNA can be prepared by conventional methods 
and isolated by gel electrophoresis. This method is less preferred in part because it 
would bias the size distribution of the reference population. The tag can be attached 
by ligation of adaptors, by PCR with an oligo dT primer and a random primer, or by 
RACE technology (Bertling et al. (1993) PCR Methods AppL 3:95-99; Frohman, 

20 MA. (1993) Methods Enzymol. 218:340-356; Marathon™ CDNA Amplification Kit, 
Clontech Laboratories, Inc.). Attachment of the tag by cloning into a vector, as 
described above, is preferred for several reasons, including the ability to generate 
large quantities of the refermce population (versus RACE, which typically yields only 
^g quantities), and the ability to check the sequence of the tag. 

25 After formation of a library of tag-cDNA conjugates, a sample of host cells is 

usually plated to determine the number of recombinants per unit volume of culture 
medium. The size of sample taken for further processing preferably depends on the 
size of tag repertoire used in the library construction. As taught by Brenner et al., 
U,S. patent 5,846,719 and Brenner et al., U.S. patent 5,604,097, a sample preferably 

30 includes a number of conjugates equivalent to about one percent the size of the tag 
repertoire in order to minimize the selection of "doubles," Le. two or more conjugates 
carrying the same tag and different cDNAs. Thus, for a tag repertoire consiisting of a 
concatenation of eight 4-nucleotide *Svords" selected fix>m a minimally cross- 
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hybridizing set of eight words, the size of the repertoire is 8 , or about 1.7 x 10 tags. 
Accordingly, with such a repertoire, a sample of about 1.7 x 10^ conjugate-containing 
vectors is preferably selected for amplification and further processing as illustrated in 
Figure 3b. 

5 Preferably, tag-cDNA conjugates are carried in vector (330) which comprises 

the following sequence of elements: first primer binding site (332), restriction site 13 
(334), oligonucleotide tag (336), junction (338), cDNA (340), restriction site r4 (342), 
and second primer binding site (344). After a sample is taken of the vectors 
containing tag-cDNA conjugates the following steps are implemented: The tag- 

10 cDNA conjugates are preferably amplified fix>m vector (330) by use of biotinylated 
primer (348) and labeled primer (346) in a conventional polymerase chain reaction 
(PGR) in the presence of S-mefiiyldeoxycytidine triphosphate, after which the 
resulting amplicon is isolated by streptavidin capture. Restriction site ra preferably 
corresponds to a rare-cutting restriction endonuclease, such as Pac I, Not I, Fse I, Pme 

IS I, Swa I, or the like, which permits the captured amplicon to be release from a support 
with minimal probability of cleavage occurring at a site internal to the cDNA of the 
amplicon. Junction (338) which is illustrated as the sequence: 

5 " . . . GGGCCC . . . 
20 3 ' . . . CCCGGG . . . 



causes the DNA polymerase "stripping** reaction to be halted at the G triplet, when an 
appropriate DNA polymerase is used with dGTP. Briefly, in the "stripping** reaction, 
the 3*->5' exonuclease activity of a DNA polymerase, preferably T4 DNA 

25 polymerase, is used to render the tag of the tag-cDNA conjugate single stranded, as 
taught by Bremier, U.S. patent 5,604,097; and Kuijper et al.. Gene, 1 12: 147-155 
(1992). In the preferred embodiment where sorting is accomplished by formation of 
duplexes between tags and tag complements, tags of tag-cDNA conjugates are 
rendered single stranded by first selecting words that contain only three of the four 

30 natural nucleotides, and then by preferentially digesting the three nucleotide types 
from the tag-cDNA conjugate in the y-^y direction with the 3*^5' exonuclease 
activity of a DNA polymerase. In the preferred embodiment, oligonucleotide tags are 
designed to contain only A's, Gs, and T's; thus, tag complements (including that in the 
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double stranded tag-cDNA conjugate) consist of only A's, C's, and Ts. Whai the 
released tag-cDNA conjugates are treated with T4 DNA polymerase in the presence 
of dGTP, the complementary strands of the tags are "stripped" away to the first G. At 
that point, the incorporation of dO by the DNA polymerase balances the exonuclease 
5 activity of the DNA polymorase, effectively halting the "stripping" reaction. From 
the above description, it is clear that one of ordinary skill could make many 
alternative design choices for carrying out the same objective, i.e. rendering the tags 
single stranded. Such choices could include selection of different en2ymes, different 
conq)ositioiis of words making iq) the tags, and the like. 
10 When the "stripping" reaction is quenched, the result is duplex (356) with 

single stranded tag (357). Afta: isolation, steps (358) are implemented: thet^- 
cDNA conjugates are hybridized to tag complements attached to microparticles, a fill- 
in reaction is carried out to fill any gap between the complementary strand of the tag- 
cDNA conjugate and the 5' end of t^ complement (362) attached to microparticle 
1 5 (360), and the complementary strand of the tag-cDNA conjugate is covalently bonded 
to the 5' end (363) of tag complemrait (362) by treating with a Ugase. This 
embodiment requires, of course, that the 5' end of the tag complement be 
phosphorylated, e.g. by a kinase, such as, T4 polynucleotide kinase, or the like. The 
fill-in reaction is preferably carried out because the "stripping" reaction does not 
20 always halt at the first G. Preferably, the fill-in reaction uses a DNA polymerase 

lacking 5'-^3' exonuclease activity and strand displacement activity, such as T4 DNA 
polymerase. Also preferably, all four dNTPs are used in the fill-in reaction, in case 
the "stripping" extended beyond the G triplet 

As explained fimher below, the tag-cDNA conjugates are hybridized to the 
25 full repertoire of tag complements. That is, among the population of microparticles, 
there are microparticles having every tag sequence of the entire repertoire. Thus, the 
tag-cDNA conjugates will hybridize to tag complements on only about one percrait of 
ttie microparticles. Microparticles to which tag-cDNA have been hybridized are 
referred to h«ein as "loaded microparticles." For greater efficiency, loaded 
30 microparticles are preferably separated firam unloaded microparticles for fiirther 
processing. Such separation is conveniently accomplished by use of a fluorescoice- 
activated cell sorter (FACS), or similar instrument that permits rapid manipulation 
and sorting of large numbers of individual microparticles. In the onbodiment 
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illustrated in Figure 3b, a fluorescent labeU e.g. FAM (a fluorescein derivative, 
Haugland, Handbook of Fluorescent Probes and Research Chemicals. Sixth Edition, 
(Molecular Probes, Eugene, OR, 1996)) is attached by way of primer (346). 

The tag-cDNA can be attached to the tag complement on the microparticles by 

5 a procedure omitting or modifying many of the steps discussed above. For example, 
instead of ampUfying the tag-cDNA from vector (330). the tag<DNA can be cleaved 
from the vector by restriction digest, stripped, and Ugated directly to the tag 
complement on the microparticles. This procedure omits (1) labeling the tag-cDNA 
wifli biotin and FAM, (2) ampUfying the tag-cDNA, and (3) isolating the ampUcon by 

10 streptavidin capture. If desired, loaded microparticles can be isolated by hybridizing 

with a FAM-labeled primer. 

As shown in Figure 3c, after FACS, or like sorting (380), loaded 
microparticles (360) are isolated, treated to remove label (345), and treated to melt off 
the non-covalently attached strand. Label (345) is removed or inactivated so that it 
15 does not interfer with the labels of the competitivefy hybridized strands. Preferably, 
the tag-cDNA conjugates are treated with a restriction endonuclease recognizing site 
r, (342) which cleaves the tag-cDNA conjugates adjacent to primer binding site (344), 
thereby removing label (345) carried by the "bottom" strand, Le. the strand have its 5' 
end distal to the microparticle. Preferably, this cleavage results in microparticle (360) 
20 with double stranded tag-cDNA conjugate (384) having protruding strand (385). 3'- 
labded adaptor (386) is then annealed and Ugated to protruding strand (385), after 
which the loaded microparticles are re-sorted by means of the 3'-label and the strand 
carrying the 3'-label is melted off to leave a covalently attached single strand of the 
cDNA (392) ready to accept denatured cDNAs or mRNAs from differentiaUy 
25 expressed genes. Preferably, the 3'-labeled strand is melted off with sodium 
hydroxide treatment, or treatment witii like reagent 

Clonal subpopulations of gDNAs can be attached to microparticles in a similar 
manner. First, genomic DNA is isolated from a ceU or tissue source of interest using 
conventional techniques and is cleaved with at least one restriction endonuclease, 
30 which preferably cleaves at a four-base recognition, such as, for example, Dpn H, 
San3 A I, Aci I, Alu I, Bfa I, BstU I. Hae m, Hha I, HinPl I. Hpa II, Mbo I. Mse I, 
Msp I, Nla in, Rsa I. Taq° I. Tsp 509 1, and the like. Preferably, the cleaved fragment 
has an overiiang of at least one base. Alternatively, genomic DNA fragments can be 
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prepared by shearing or sonicating the isolated genomic DNA. The tag can then be 
linked to the gPNA in a number of ways, including random primed PGR with primers 
containing the tag sequence or cloning into a vector containing a tag in a manner 
similar to that described above for a cDNA reference population. A label such as 

5 FAM can be attached in order to monitor the loading of the microparticles. In some 
instances, directional attachment onto the microparticles can be achieved by 
amplifying the gDNA with a primer having a consensus sequence, such as, for 
example, the TATA box, or a sequence complementary to a consensus sequence. 
When using a gDNA reference population for evaluating gene expression, it may be 

10 desirable to reduce noncoding sequence and introns in the gDNA library. For 
example, a large gDNA library of about 60 x 10^ microparticles can be reduced to 
about 30,000-40,000 by culling, using cDNA pools as a probe. 



m^goTnicleotide Tags for Tdentification and Solid Phase Cloning 

1 5 An important feature of the invention is the use of oligonucleotide tags which 

are members of a minimally cross-hybridizing set of oligonucleotides to construct 
reference DNA populations attached to solid phase supports, preferably 
microparticles. The sequences of oligonucleotides of a minimally cross-hybridizing 
set differ from the sequences of every other member of the same set by at least two 

20 nucleotides. Thus, each member of such a set cannot form a duplex (or triplex) with 
the complement of any other member with less than two mismatches. Complements 
of oligonucleotide tags, referred to herein as "tag complements," may comprise 
natural nucleotides or non-natural nucleotide analogs. When oligonucleotide tags are 
used for sorting, as is the case for constructing a reference DNA population, tag 

25 complements are preferably attached to solid phase supports. Oligonucleotide tags 
when used with their correspondmg tag complemmts provide a means of enhancing 
specificity of hybridization for sorting, tracking, or labeling molecules, especially 
polynucleotides, such as cDNAs or mRNAs derived fiom expressed genes. 

Minimally cross-hybridizing sets of oligonucleotide tags and tag complements 

30 may be synthesized either combinatorially or individually depending on the size of the 
set desired and the degree to which cross-hybridization is sought to be minimized (or 
stated another way, the degree to which specificity is sought to be enhanced). For 
example, a minimally cross-hybridizing set may consist of a set of individually 
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synthesized 10-mer sequences that differ from each other by at least 4 nucleotides, 
such set having a maximum size of 332, when constructed as disclosed in Brenner et 
aL, U.S. patent 5,604,097. Alternatively, a minimally cross-hybridizing set of 
oligonucleotide tags may also be assembled combinatorially from subunits which 
5 themselves are selected from a minimally cross-hybridizing set. For example, a set of 
minimally cross-hybridizing 12-mers differing from one another by at least three 
nucleotides may be synthesized by assembling 3 subunits selected from a set of 
minimally cross-hybridizing 4-mers that each differ from one another by three 
nucleotides. Such an embodiment gives a maximally sized set of 9^, or 729, 12-mers. 

1 0 When synthesized combinatorially, an oligonucleotide tag can be randomized 

at individual positions along its length. Preferably, however, the oligonucleotide tag 
consists of a plurality of subunits, each subunit consisting of an oligonucleotide of 3 
to 9 nucleotides in length wherein each subunit is selected from the same minimally 
cross-hybridizing set. In such embodiments, the number of oligonucleotide tags 

1 5 available depends on the number of subunits per tag and on the length of the subunits. 
An oligonucleotide tag can also consist of a plurality of subunits with additional 
nucleotides on either temunus of the oligonucleotide. The additional nucleotides can 
be random and/or can comprise a restriction site. Such a structure ensures the 
instability of a duplex or triplex having a mismatch at a terminus of the 

20 oligonucleotide. Preferably, the oligonucleotide comprises a recognition site for a 
rare-cutting restriction endonuclease on at least one end. In a preferred embodiment, 
the oligonucleotide comprises an AT-rich restriction site, such as a Pac I site, on one 
end. A Bspl20 site is a preferred site on tiie other end. 

Complements of oligonucleotide tags attached to one or more solid phase 

25 supports are used to sort polynucleotides from a mixture of polynucleotides each 
containing a tag. Such tag complements are synthesized on the surface of a solid 
phase support, such as a bead, preferably microscopic, or a specific location on an 
array of synthesis locations on a single support, such that populations of identical, or 
substantially identical, sequences are produced in specific regions. That is, the 

30 surface of each support, in the case of a bead, or of each region, in the case of an 

array, is derivatized by copies of only one type of tag complement having a particular 
sequence. The population of such beads or regions contains a repertoire of tag 
complements each with distinct sequences. As used herein in reference to 
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oligonucleotide tags and tag complements, the tenn *Yep^oire" means the total 
number of different oligonucleotide tags or tag complements that are employed for 
solid phase cloning (sorting) or identification. A repertoire may consist of a set of 
minimally cross-hybridizing set of oligonucleotides that are individually synthesized, 

5 or it may consist of a concatenation of oligonucleotides each selected from the same 
set of minimally cross-hybridizing oligonucleotides. In the latter case, the rep^oire 
is preferably synthesized combinatonally. 

Preferably, tag complements are synthesized combinatorially on 
microparticles, so that each microparticle has attached many copies of the same tag 

0 complement. A wide variety of microparticle supports may be used with the 
invention, including microparticles made of controlled pore glass (CPG), highly 
cross-linked polystyrene, acrylic copolymers, cellulose, nylon, dextran, latex, 
polyacroiein, and the like, disclosed in the following exemplary references: Meth. 
EnzymoL, Section A, pages 11-147, vol. 44 (Academic Press, New York, 1976); U.S. 

5 pataats 4,678,814; 4,413,070; and 4,046;720; and Pon, Chapter 19, in Agrawal, editor. 
Methods in Molecular Biology, Vol 20, (Humana Press, Totowa, NJ, 1993). 
Microparticle supports further include commercially available nucleoside-derivatized 
CPG and polystyrene beads (e.g. available from P£ Applied Biosystems, Foster City, 
CA); derivatized magnetic beads; polystyrene grafted with polyethylene glycol (e.g., 

10 TentaGelTM, Rapp Polymere, Tubingen Germany); and the like. Microparticles may 
also consist of dendrimeric structures, such as disclosed by Nilsen et al., U.S. patent 
5,175,270. Generally, the size and shape of a microparticle is not critical; however, 
microparticles in the size range of a few, e.g. 1-2, to several hundred, e.g. 200-1000 
^m diameter are preferable, as they facilitate the construction and manipulation of 

15 large repertoires of oligonucleotide tags with minimal reagent and sample usage. 
Preferably, glycidal methacrylate (GMA) beads available from Bangs Laboratories 
(Carmel, IN) are used as microparticles in the invention. Such microparticles are 
useful in a variety of sizes and are available with a variety of linkage groups for 
synthesizing tags and/or tag complements. More preferably, 5 ^m diameter GMA 

(0 beads are employed. 

In a preferred embodiment, polynucleotides to be sorted, or cloned onto a solid 
phase support, each have an oligonucleotide tag attached, such that different 
polynucleotides have different tags. This condition is achieved by employing a 
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repertoire of tags substantially greater than the population of polynucleotides and by 
taking a sufGciently small sample of tagged polynucleotides from the full ensemble of 
tagged polynucleotides. After such sampling, when the populations of supports and 
polynucleotides are mixed under conditions which permit specific hybridization of the 
5 oligonucleotide tags with their respective complements, identical polynucleotides sort 
onto particular beads or regions. Of course, the sampled tag-polynucleotide 
conjugates are preferably amplified, e.g. by polymerase chain reaction, cloning in a 
plasmid, RNA transcription, or the like, to provide sufficient material for subsequent 
analysis. 

10 Oligonucleotide tags are employed for two different purposes in certain 

embodiments of the invention: Oligonucleotide tags are employed to implement solid 
phase cloning, as described in Brenner, U.S. patent 5,604,097; and Intemational 
patent application PCT/US96/09S13, wherein large numbers of polynucleotides, e.g. 
several thousand to several hundred thousand, are sorted from a mixture into clonal 

1 S subpopulations of identical polynucleotides on one or more solid phase supports for 
analysis, and they are employed to deliver (or accept) labels to identify 
polynucleotides, such as encoded adaptors, that number in the range of a few tens to a 
few thousand, e.g. as disclosed in Albrecht et al.. International patent application 
PCT/US97/09472. For the former use, large numbers, or repertoires, of tags are 

20 typically required, and therefore synthesis of individual oligonucleotide tags is 

difficult. In these embodiments, combinatorial synthesis of the tags is preferred On 
the other hand, where extremely large repertoires of tags are not required— such as for 
delivering labels to a plurality of kinds or subpopulations of polynucleotides in the 
range of 2 to a few tens, encoded adaptors, oligonucleotide tags of a minimally 

25 cross-hybridizing set may be separately synthesized, as well as synthesized 
combinatorially. 

Sets containing several hundred to several thousands, or even several tens of 
thousands, of oligonucleotides may be synthesized directiy by a variety of parallel 
synthesis approaches, e.g. as disclosed in Frank et al., U.S. patent 4,689,405; Frank et 
30 al.. Nucleic Acids Research, 1 1 : 4365-4377 (1983); Matson et al.. Anal. Biochem., 
224: 1 10-1 16 (1995); Fodor et al., Intemational application PCT/US93/04145; Pease 
et al,, Proc. Natl. Acad. Sci., 91: 5022-5026 (1994); Southern et al., J. Biotechnology, 
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35: 217-227 (1994), Brennan, International application PCTAJS94/05896; Lashkari et 
al.. Proc. Natl. Acad. Sci., 92: 7912-7915 (1995); or the like. 

Preferably, tag complements in mixtures, whether synthesized combinatorially 
or individually, are selected to have similar duplex or triplex stabilities to one another 
5 so that perfectly matched hybrids have similar or substantially identical melting 
temperatures. This pennits mis-matched tag complements to be more readily 
distinguished &om perfectly matched tag complements in the hybridization steps, e.g. 
by washing under stringent conditions. For combinatorially synthesized tag 
complements, minimally cross-hybridizing sets may be constructed from subunits that 
1 0 make approximately equivalent contributions to duplex stability as every other 

subunit in the set Guidance for carrymg out such selections is provided by published 
techniques for selecting optimal PGR primers and calculating duplex stabilities, e.g, 
Rychlik et al., Nucleic Acids Research, 17: 8543-8551 (1989) and 18: 6409-6412 
(1990); Breslauer et al., Proc. Natl. Acad Sci., 83: 3746-3750 (1986); Wetmur, Grit 
15 Rev. Biochem. MoL Biol., 26: 227-259 (1991); and the like. A minimaUy cross- 
hybridizing set of oligonucleotides can be screened by additional criteria, such as GC- 
content, distribution of mismatches, theoretical melting temperature, and the Uke, to 
form a subset which is also a minimally cross-hybridizing set. 

The oligonucleotide tags of the invention and their complements are 
20 conveniently synthesized on an automated DNA synthesizer, e.g. an Applied 

Biosystems, Inc. (Foster City, CaUfomia) model 392 or 394 DNA/RNA Synthesizer, 
using standard chemistries, such as phosphoramidite chemistry, e.g. disclosed in the 
following references: Beaucage and Iyer, Tetrahedron, 48: 2223-231 1 (1992); Molko 
et al., U.S. patent 4,980,460; Koster et al., U.S. patent 4,725,677; Caruthers et al., 
25 U.S. patents 4,415,732; 4,458,066; and 4,973,679; and the Uke. 

Oligonucleotide tags for sorting may range in length from 12 to 60 nucleotides 
or basepairs. Preferably, oligonucleotide tags range in length from 18 to 40 
nucleotides or basepairs. More preferably, oligonucleotide tags range in length from 
25 to 40 nucleotides or basepairs. In terms of preferred and more preferred numbers 
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of subunits, these ranges may be expressed as follows: 

Numbers of Subuni ts in Tags in Preferred Embodiments 

Monomers 



in Submit Nucleotides in Oligonucleotide Tag 

(12-60) (18-40) (25-40) 

3 4-20 subunits 6-13 subunits 8-13 subunits 

4 3-lS subunits 4-10 subunits 6-10 subunits 

5 2-12 subunits 3-8 subunits S-8 subunits 

6 2-10 subunits 3-6 subunits 4-6 subunits 



Most preferably, oligonucleotide tags for sorting are single stranded and specific 
5 hybridization occurs via Watson-Crick pairing with a tag complement. 

Preferably, repertoires of single stranded oligonucleotide tags for sorting 
contain at least 100 members; more preferably, repertoires of such tags contain at 
least 1000 members; and most preferably, repertoires of such tags contain at least 
10,000 members. 

10 Preferably, the length of single stranded tag complements for delivering labels 

is between 8 and 20. More preferably, the length is between 9 and IS. 

In embodiments where specific hybridization occurs via triplex formation, 
coding of tag sequences follows the same principles as for duplex-forming tags; 
however, there are further constraints on the selection of subunit sequences. 

15 Generally, third strand association via Hoogsteen type of binding is most stable along 
homopyrimidine-homopurine tracks in a double stranded target. Usually, base triplets 
form in T-A*T or C-G*C motifs (where indicates Watson-Crick pairing and 
indicates Hoogsteen type of binding); however, other motifs are also possible. For 
example, Hoogsteen base pairing permits parallel and antiparallel orientations 

20 between the third strand (the Hoogsteen strand) and the purine-rich strand of the 

duplex to which the tturd strand binds, depending on conditions and the composition 
of the strands. There is extensive guidance in the Uterature for selecting appropriate 
sequences, orientation, conditions, nucleoside type (e.g. whether ribose or 
deoxyribose nucleosides are employed), base modifications (e.g. methylated cytosine, 

25 and the Uke) in order to maximize, or otherwise regulate, triplex stability as desired in 
particular embodiments. Conditions for annealing single-stranded or duplex tags to 
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their single-stranded or duplex complements are well known, e.g. Ji et al., Anal. 
Chem. 65: 1323-1328 (1993); Cantor et al., U.S. patent 5,482,836; and the like. Use 
of triplex tags in sorting has the advantage of not requiring a "stripping" reaction with 
polymerase to expose the tag for annealing to its complement 
5 An exemplary tag library for sorting is shown below (SEQ ID NO: 1). 

Left Primer Bsp 1201 

5 ' -AGAATTCGGGCCTTAATTAA 4^ 

5 » -AGAATTCGGGCCTTAATTAA- [4(A,G,T)a] -GGGCCC- 
TCTTAAG CCCGGAATTAATT^ [4(T,C,A)8l -CCCGGG- 

t t 
Eco RI Pac I 



10 



15 



20 



25 



Bbs I 

•GCATAAGTCTTCXXX 
■CGTATTCaSSMSXXX 



Bam HI 
i 

XXXGGATCCGAGTGAT -3' 
XXXCOaGSCTCACTA 

XXXXXCCTAGGCTCACT 



A- 5 



Right Primer 



Formula I 



The flanking regions of the oligonucleotide tag may be engineered to contain 
restriction sites, as exemplified above, for convenient insertion into and excision firom 
cloning vectors. Optionally, the right or left primers may be synthesized with a biotin 
attached (using conventional reagents, e.g. available torn Clontech Laboratories, Palo 
Alto, CA) to facilitate purification after amplification and/or cleavage. Preferably, for 
making tag-fiagment conjugates, the above library is inserted uito a conventional 
cloning vector, such a pUC19, or the like. Optionally, the vector containing the tag 
library may contain a "stuffer'* region, "XXX ... XXX,'* which facilitates isolation of 
fiagments fully digested with, for example. Bam HI and Bbs 1. 

An important aspect of the invention is the sorting and attachment of 
populations of DNA sequences, eg. from a cDNA library, to microparticles or to 
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sq)arate regions on a solid phase support such fhat each micropardcle or region has 
substantially only one kind of sequence attached; that is, such that the DNA sequences 
are present in clonal subpopulations. This objective is accomplished by insuring that 
substantially all different DNA sequences have different tags attached. This condition^ 

S in turn, is brought about by taking only a sample of the full ensemble of tag-DNA 
sequence conjugates for analysis. (It is acceptable that identical DNA sequences have 
different tags, as it merely results in the same DNA sequence being operated on or 
analyzed twice.) Such sampling can be carried out either overtly-for example, by 
taking a small volume from a larger mixture-after the tags have been attached to the 

10 DNA sequences; it can be carried out inherently as a secondary effect of the 

techniques used to process the DNA sequences and tags; or sampling can be earned 
out both overtly and as an inherent part of processing steps. 

If a sample of n tag-DNA sequence conjugates are randomly drawn fiom a 
reaction mixture-*as could be effected by taking a sample volume, the probability of 

1 5 drawing conjugates having the same tag is desoibed by the Poisson distribution, 
P(r)=^'\3l)7r, where r is the number of conjugates having the same tag and X==np, 
where p is the probability of a given tag being selected. If n=10^ and p=l/(L67 x 10^) 
(for example, if eight 4-base words described m Brenner et al. were employed as 
tags), then X=.0149 and P(2)=1.13 x 10"^. Thus, a sample of one million molecules 

20 gives rise to an expected number of doubles well within the preferred range. Such a 
sample is readily obtained by serial dilutions of a mixture containing tag-fragment 
conjugates. 

As used herein, the term ''substantially all" in reference to attaching tags to 
molecules, especially polynucleotides, is meant to reflect the statistical nature of the 

25 sampling procedure employed to obtain a population of tag-molecule conjugates 
essentially free of doubles. Preferably, at least ninety-five percent of the DNA 
sequences have unique tags attached. 

Preferably, DNA sequences are conjugated to oligonucleotide tags by inserting 
the sequences into a conventional cloning vector carrying a tag library. For example, 

30 cDNAs may be constructed having a Bsp 120 1 site at their S' ends and after digestion 
with Bsp 120 1 and anoth^ enzyme such as Sau 3 A or Dpn n may be directionally 
inserted into a pUC19 carrying the tags of Formula I to form a tag-cDNA library, 
which includes every possible tag-cDNA pairing. A sample is taken from this library 
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for amplification and sorting. Sampling may be accomplished by serial dilutions of 
the library, or by simply picking plasmid-containing bacterial hosts firom colonies. 
After amplification, the tag-cDNA conjugates may be excised firom the plasmid. 

After the oligonucleotide tags are prepared for specific hybridization, e.g. by 
5 rendering them single stranded as described above, the polynucleotides are mixed 
with microparticles containing the complementary sequences of the tags under 
conditions that favor the fonnation of perfectly matched duplexes between the tags 
and their complements. There is extensive guidance in the Uterature for creating these 
conditions. Exemplary references providing such guidance include Wetmur, Critical 

10 Reviews in Biochemistry and Molecular Biology, 26: 227-259 (1991); Sambrook et 
al.. Molecular Cloning: A Laboratory Manual, 2nd Edition (Cold Spring Harbor 
Laboratory, New Yoik, 1989); and the like. Preferably, the hybridization conditions 
are sufficiently stringent so that only perfectly matched sequences form stable 
diq>lexes. Under such conditions the polynucleotides specifically hybridized through 

1 5 their tags may be Ugated to the complementary sequences attached to the 

microparticles. Finally, the microparticles are washed to remove polynucleotides with 
unhgated and/or mismatched tags. 

Specificity of the hybridizations of tag to their complements may be increased 
by taking a sufBciently small sample so that both a high percentage of tags in the 

20 sample are unique and the nearest neighbors of substantially all the tags in a sample 
differ by at least two words. This latter condition may be met by taking a sample that 
contains a number of tag-polynucleotide conjugates that is about 0.1 percent or less of 
the size of the repertoire being employed. For example, if tags are constructed with 
eight words a repertoire of 8^, or about L67 x 10^, tags and tag complements are 

25 produced. In a library of tag-DNA sequence conjugates as described above, a 0.1 

percent sample means that about 16,700 different tags are present If this were loaded 
directly onto a repertoire-equivalent of microparticles, or in this example a sample of 
1.67 X 10^ microparticles, thai only a sparse subset of the sampled microparticles 
would be loaded. Preferably, loaded microparticles may be separated fix>m unloaded 

30 microparticles by a fluorescence activated cell sorting (FACS) instrument using 
conventional protocols after DNA sequences have been fluorescently labeled and 
denatured. After loading and FACS sorting, the label may be cleaved prior use or 
other analysis of the attached DNA sequences. 
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A reference DNA population may consist of any set of DNA sequences whose 
ficquencies in different test populations is sought to be compared. Preferably, a 
reference DNA population for use in the analysis of gene expression in a pluraUty of 
cells or tissues is constructed by generating a cDNA Ubraiy from each of the cells or 
5 tissues whose gene expression is being compared. This mBy be accomplished either 
by pooUng the mRNA extracted from the various cells and/or tissues, or it may be 
accompUshed by pooling the cDNAs of separately constructed cDNA Ubraiies. 
Alternatively, a reference DNA population may be constructed from genomic DNA. 
The objective is to obtain a set of DNA sequences that will include all of the 
10 sequences that could possibly be expressed in any of the cells or tissues being 
analyzed. Once the DNA sequences making up a reference DNA population are 
obtamed, they must be conjugated with oUgonucleotide tags for soUd phase cloning. 
Preferably, the DNA sequences are prepared so that they can be inserted into a vector 
carrying an appropriate tag repertoire, as described above, to form a library of tag- 
15 DNA sequence conjugates. A sample of conjugates is taken from this Ubrary. 
ampUfied, and loaded onto microparticles. It is important that the sample be large 
enough so that there is a high probabiUty that all of the different types of DNA 
sequences are represented on the loaded microparticles. For example, if among a 
plurality of cells being compared a total of about 25.000 genes are expressed, then a 
20 sample of about five-fold this number, or about 125,000 tag-DNA sequence 
conjugates, should be taken to ensure that aU possible DNA sequences will be 
represented among the loaded microparticles with about a 99% probabiUty. e.g. 

Sambrook et al. (cited above). 

In another embodiment, the reference population can comprise a set of 

25 polynucleotides encoding a specific set or sets of proteins selected from the group 
consisting of ceU cycle proteins, signal transduction pathway proteins, oncogene gene 
products, tumor suppressors, kinases, phosphatases, transcription factors, growth 
factor receptors, growth fectors, extracellular matrix proteins, proteases, cytoskeletal 
proteins, membrane receptors. Rb pathway proteins. p53 pathway proteins, proteins 

30 involved in metaboUsm. proteins involved in cellular responses to stress, cytokmes, 
proteins involved in DNA damage and repair, and proteins involved in apoptosis. 
Such polynucleotides are typically attached to the soUd phase supports through 
oUgonucleotides having a unique sequence per soUd support, but such polynucleotides 
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can also be attached to the solid phase supports through an oUgonucleotide with a 
sequwice common for each soUd phase support, such as, for example a 
polyadenylated oligonucleotide. 

Preferably, after the tag-DNA sequence conjugates are sampled, they are 
5 amplified by PCR usmg a fluorescently labeled primer to provide sufficient material 
to load onto the tag complements of the microparticles and to provide a means for 
distinguishing loaded fiom unloaded microparticles, as disclosed in Brenner et al. 
(cited above). Preferably, the PCR primer also contains a sequence which aUows the 
generation of a restriction site of a rare-cutting restriction endonuclease. such as Pac I, 
10 in the double stranded product so that the fluorescent label may be cleave from the 
end of the cDNA prior to the competitive hybridization of labeled DNA strands 
derived fiom cells or tissue being studied. After such loading, the specificaUy 
hybridized tag-DNA sequence conjugates are ligated to the tag complements and the 
loaded microparticles are separated firon the unloaded microparticles by FACS. The 
15 fluorescent label is cleaved fiom the DNA strands of the loaded microparticles and 
the non-covalently attached strand is removed by denaturing with heat, formamide, 
NaOH, and/or with like means, using conventional protocols. The microparticles are 
ttien ready for competitive hybridization. 

20 i pnmp^ritivfi Hybridization and Lipht-GfinffratlTie Labels 

Gene expression products, e.g. mRNA or cDNA, from the ceUs and/or tissues 
being analyzed are isolated. The expression products are labeled so as to distmguish 
the source. Preferably, the products fmm each source comprise a label different fiom 
the label comprised by the products of any other source, e.g., each having aumcpie 
25 and distinguishable emission frequency. Alternatively, the product of one source can 
be left unlabeled. The expression products can be labeled by conventional techniques, 
e.g. DeRisi et al. (cited above), or the Uke. Preferably, a Ught-generating label is 
incorporated into cDNAs reverse transcribed from the extracted mRNA, or an 
oligonucleotide tag is attached fox providing a labeled tag complement for 
30 identification. A hirge number of light-generating labels are avaUable, including 
fluorescent, colorimetric, chemiluminescent, and electroluminescent labels. 
Generally, such labels produce an optical signal which may comprise an absorption 
frequency, an emission frequency, an intensity, a signal Ufetime, or a combination of 
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such characteristics. Preferably, fluorescent labels are employed, either by direct 
incorporation of fluorescently labeled nucleoside triphosphates or by indirect 
application by incorporation of a capture moiety, such as biotinylated nucleoside 
triphosphates or an ohgonucleotide tag, followed by complexing with a moiety 

5 capable of generating a fluorescent signal, such as a streptavidin-fluorescent dye 
conjugate or a fluorescently labeled tag complement Preferably, the optical signal 
detected from a fluorescent label is an intensity at one or more characteristic emission 
frequencies. Selection of fluorescent dyes and means for attaching or incorporating 
Ihem into DNA strands is well known, e.g. DeRisi et al. (cited above), Matthews et 

0 al.. Anal. Biochem., Vol 169, pgs. 1-25 (1988); Haugland, Handbook of Fluorescent 
Probes and Research Chemicals (Molecular Probes, Inc., Eugene, 1992); Keller and 
Manak, DNA Probes, 2nd Edition (Stockton Press, New York, 1993); and Eckstem, 
editor. Oligonucleotides and Analogues: A Practical Approach (IRL Press, Oxford, 
1991); Wetmur, Critical Reviews in Biocheniistry and Molecular Biology, 26: 227- 

5 259 (1991); Ju et al., Proc. Natl. Acad. Sci., 92: 4347-4351 (1995) and Ju et al., 
Nature Medicine, 2: 246-249 (1996); and the like. 

Preferably, Ught-generating labels are selected so that their respective optical 
signals can be related to the quantity of labeled DNA strands present and so that the 
optical signals generated by different light-generating labels can be compared. 

10 Measurement of the emission intensities of fluorescent labels is the preferred means 
of meeting this design objective. For a given selection of fluorescent dyes, relating 
their emission intensities to the respective quantities of labeled DNA strands requires 
consideration of several factors, including fluorescent emission maxima of the 
different dyes, quantum yields, emission bandwidths, absorption maxima, absorption 

,5 bandwidths, nature of excitation Ught source(s), and the like. Guidance for making 
fluorescent intensity measurements and for relating them to quantities of analytes is 
available in the literature relating to chemical and molecular analysis, e.g. Guilbault, 
editor. Practical Fluorescence, Second Edition (Marcel Dekker, New York, 1990); 
Pesce et al., editors, Fluorescence Spectroscopy (Marcel Dekker, New York, 1971); 

;0 White et al., Fluorescence Analysis: A Practical Approach (Marcel Dekker, New 
Yoric, 1970); and the like. As used herein, the temi 'Relative optical signal" means a 
ratio of signals finom different light-generating labels that can be related to a ratio of 
differently labeled DNA strands of identical, or substantially identical, sequences that 
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form duplexes with a complementary reference DNA strand. Preferably, a relative 
optical signal is a ratio of fluorescence intensities of two or more different fluorescent 
dyes. 

Competitive hybridization between the labeled DNA strands derived from the 
5 pluiaUty of cells or tissues is carried out by applying equal quantities of the labeled 
DNA strands from each such source to the microparticles loaded with the reference 
DNA population in a conventional hybridization reaction. The particular amoimts of 
labeled DNA strands added to the competitive hybridization reaction vary widely 
depending on the embodiment of the invention. Factors influencing the selection of 
1 0 such amounts include the quantity of microparticles used, the type of microparticles 
used, the loading of reference DNA strands on the microparticles, the complexity of 
the populations of labeled DNA strands, and the like. Hybridization is competitive in 
that differently labeled DNA strands with identical, or substantially identical, 
sequences compete to hybridize to the same complemratary reference DNA strands. 
1 5 The competitive hybridization conditions are selected so that the proportion of labeled 
DNA strands forming duplexes with complementary reference DNA strands reflects, 
and preferably is directly proportional to, the amount of that DNA strand in its 
population in comparison with the amount of the competing DNA strands of identical 
sequence in their respective populations. Thus, if a first and second differently 
20 labeled DNA strands with identical sequence are competing for hybridization with a 
complementary reference DNA strand such that the first labeled DNA strand is at a 
concentration of 1 ng/|il and the second labeled DNA strand is at a concentration of 2 
ng/\il then at equilibrium it is expected that one third of the duplexes formed with the 
reference DNA would include first labeled DNA strands and two thirds of the 
25 duplexes would include second labeled DNA strands. Guidance for selecting 

hybridization conditions is provided in many references, including Keller and Manak, 
(cited above); Wetmur, (cited above); Hames et al., editors. Nucleic Add 
Hybridization: A Practical Approach (JRL Press, Oxford, 1985); and the like. 

Another aspect of the invention is a kit for analyzing differentially expressed 
30 genes, comprising a mixture of microparticles, each microparticle having a population 
of identical single stranded nucleic acid molecules attached thereto, the single 
stranded nucleic acid molecules being different on each microparticle and comprising 
a polynucleotide derived from an mRNA of at least one cell or tissue source. 
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Preferably, each of said nucleic acid molecules further comprises an oligonucleotide 
tag in juxtq)Osition with said polynucleotide and positioned between said 
microparticle and said polynucleotide. The kit can further comprise a population of 
cDNA molecules from at least one of said cell or tissue sources, reagents for labeling 
5 the cDNA populations, reagents for performing competitive hybridization, and the 
like. If desired, the cDNA molecules in the kit are provided in fluorescently labeled 
form. The kit can contain additional components for performing competitive 
hybridization, such as, for example, hybridization buffers, PGR buffers and standards, 
and the like. The kit can further comprise at least one container or several containers 

10 for each of the components and can comprise printed instructions for use m analyzing 
differentially ejcpressed genes. 

The invention also provides a kit for preparing a reference population, 
comprising a plurality of microparticles having oligonucleotide tag complements 
attached thereto, the oligonucleotide tag complement sequence being different on 

IS each microparticle. The kit can fiurther comprise a plurality of vectors comprising a 
library of tags having sequences complementary to the tag complements. The kit can 
further comprise a population of polynucleotides from at least one cell or tissue 
source, preferably cDNAs. When a population of polynucleotides is included, 
preferably the population of polynucleotides is contained in a container separate 

20 said plurality of microparticles. The kit can also contain reagents for preparing the 
reference population, such as, for example, adaptors, labels, polymerase, dNTP's, 
labelled dNTP's, PGR buffers, and the like, as well as printed instructions for 
preparing the reference population. 



25 Flow Sorting of Microparticles with Up-Repilated 

and/or Down-Reguiated Gene Products 
After labeled polynucleotides are competitively hybridized to a reference 
population on microparticles, the microparticles may be analyzed and/or sorted in a 
number of ways depending on the chemical and/or physical properties of the 
30 microparticles and the attached sequences. For example, microparticles of interest 
may be mechanically separated by micro-manipulators, magnetic microparticles may 
be sorted by adjusting or manipulating magnetic fields, charged microparticles may be 
manipulated by electrophoresis, or the like. The following references provide 
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guidance for selecting means for analyzing and/or sorting microparticles: Pace, U.S. 
Patent 4,908.1 12; Saur et al., U.S. Patent 4,710,472; Senyei et al., U.S. Patent 
4,230,685; Wilding et al., U,S. Patent 5,637,469; Penniman et al., U.S. Patent 
4,661,225; Kamaukhov et al., U.S. Patent 4,354.1 14; Abbott et al., U.S. Patent 

5 5,104,791 ; Gavin et aL, PCT publication WO 97/40383; and the like. Preferably, 
microparticles containing fluorescently labeled DNA strands are conveniently 
classified and sorted by a commercially available FACS instrument, e.g. Van Dilla et 
aL, Flow Cytometry: Instrumentation and Data Analysis (Academic Press. New 
York, 1985); Fulwyler et al., U.S. Patent 3,710,933; Gray et al., U.S. Patent 

10 4,361,400; Dolbeare et al., U.S. Patent 4.812,394; and the like. For fluorescently 
labeled DNA strands competitively hybridized to a reference strand, preferably the 
FACS mstrument has multiple fluorescent channel capabilities. Preferably, upon 
excitation with one or more high intensity light sources, such as a laser, a mercury arc 
lamp, or the like, each microparticle will generate fluorescent signals, usually 

1 5 fluorescence intensities, related to the quantity of labeled DNA strands Scorn each cell 
or tissue types carried by the microparticle. As shown in Figure la of Example 1, 
when fluorescent intensities of each microparticle are plotted on a two-dunensional 
graph, microparticles indicating equal expression levels will be on or near the 
diagonal (100) of the graph. Up-regulated and down-regulated genes will appear m 

20 the off-diagonal regions (1 12). Such microparticles are readily sorted by commercial 
FACS instruments by graphically defining sorting parameters to enclose one or both 
off-diagonal regions (1 12) as shown in Figure lb. Thus, microparticles can be sorted 
according to their relative optical signal, and if desired, collected for further analysis 
by accumulating those microparticles generating a signal within a predetermined 

25 ninge of values conresponding to a difference in gene expression among the different 
cell or tissue sources. 



V\nw Sorting of M iCTopaTticles According to the Abundance 
nf TJiicleic Aci d Sequences from which the Polvnucleotides are Derived 

30 Microparticles containing fluorescently labeled DNA strands can also be 

classified and sorted according to the abundance of the gene products firom which 
they are derived. The abundance of a nucleic acid sequence can be determined by the 
methods described above for determining relative gene expression and can be 
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correlated with the level of intensity of the optical signal generated by the 
polynucleotides bound to the microparticles. A lower intensity is indicative of a rarer 
nucleic acid sequence, such as a rare gene product Rare genes are genes encoding an 
mRNA which is present in about 100 copies per cell or less, with increasing 

5 preference for less than about 50 copies to less than about 25 copies, with less than 
about 10 copies per cell being most preferred. Rare genes can be isolated by 
collecting microparticles with low fluorescent intensities as shown in Examples 9 and 
10. The collected microparticles typically comprise less than about 5% of the total 
microparticles, with increasing preference for less than about 2.5%, 1%, to 0.5% with 

10 less than about 0. 1 % being most preferred. 

Alternatively, since hybridization rates are proportionate to the abundance of a 
nucleic acid sequence, less abundant nucleic acid sequences can be isolated by setting 
the hybridization conditions such that nucleic acid sequences present in a lower 
abundance in a cell or tissue source remain unhybridized. Suitable hybridization 

1 5 conditions include those conditions used for producing normalized cDNA libraries 
(Patanjali et al., Proc. Natl Acad. Set USA, 88:1943-1947 (1991)). For example, rare 
genes can be isolated by collecting unhybridized DNA after allowing a maximum 
period of time for hybridization of the abundant DNA species. 

Repetitive sequences can often complicate the mapping and analysis of 

20 polymorphisms. Repetitive sequences exist due to the presence in the genome of 
transposons, retrotransposons, retroviruses, short interspersed repetitive elements 
(SINEs) such as Alu sequences, satellite DNA, minisatellite DNA, megasatellite 
DNA, and the hke. Repetitive sequences can be removed from a DNA population as 
described above by sorting rapidly hybridizing DNA species away from DNA species 

25 that are slower to hybridize. Preferably, the unhybridized population is substantially 
enriched in polynucleotides derived from non-repetitive nucleic acid sequences. 

Another aspect of the invention is a kit for analyzing and/or isolating nucleic 
acid sequences with respect to their abundance comprising microparticles prepared as 
described above and printed instructions for use. 
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5 




and another for the stepwise sequencing the end of a DNA fragment (e.g. Brenner, 
U.S. patent 5,599,675 and Aibrecht et al.. International patent apphcation 
PCTAJS97/09472). After an initial digestion of a target polynucleotide with a first 

1 0 restriction endonuclease, restriction fragments are ligated to oligonucleotide tags as 
described below, and in Brenner et al.. International application PCTAJS96/09513, so 
that the resulting tag-fragment conjugates may be sampled, amplified, and sorted onto 
separate solid phase supports by specific hybridization of the oligonucleotide tags 
with their tag conq)l6ments. 

1 S Once an amplified sample of DNA fragments is sorted onto solid phase 

supports to fonn homogeneous populations of substantially identical fragments, the 
ends of the firagments are preferably sequenced with an adaptor-based method of 
DNA sequencing that includes repeated cycles of hgation, identification, and 
cleavage, such as the method described in Brenner, U.S. patent 5,599,675. In fiirther 

20 preference, adaptors used in the sequencing method each have a protruding strand and 
an oligonucleotide tag selected from a minimally cross-hybridizing set of 
oligonucleotides, as taught by Aibrecht et al.. International patent application 
PCT/US97/09472. Such adaptors are referred to herein as "encoded ad^tors." 
Encoded adaptors whose protruding strands form perfectly matched duplexes with the 

25 complementary protruding strands of a fragmmt are ligated. After ligation, the 
identity and ordering of the nucleotides in the protruding strand is determined, or 
"decoded," by specifically hybridizing a labeled tag complement, or "de-coder" to its 
corresponding tag on the Ugated adaptor. 



nuclease recognition site of a nuclease whose cleavage site is separate fix)m its 
recognition site; (b) identifying one or more nucleotides at the end of the fragment by 
the identity of the encoded adaptor ligated thereto; (c) cleaving the fragment with a 



30 
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nuclease recognizing the nuclease recognition site of the encoded adaptor such that 
the fragment is shortened by one or more nucleotides; and (d) repeating said steps (a) 
through (c) until said nucleotide sequence of the end of the fragment is detennined 
In the identification step, successive sets of tag complements, or "de-coders," are 
5 specifically hybridized to the respective tags carried by encoded adaptors ligated to 
the ends of the fragments. The type and sequence of nucleotides in the protruding 
strands of the polynucleotides are identified by the label carried by the specifically 
hybridized de-coder and the set &om which the de-coder came, as described below. 

IQ Identification o f Sorted Genes hv Conventional Sequencing 

Gene products carried by microparticles may be identified after sorting, e.g. by 
FACS, using conventional DNA sequencing protocols. Suitable templates for such 
sequencmg may be generated in several different ways starting fit)m the sorted 
microparticles carrying differentially expressed gene products. For example, the 

1 5 reference DNA attached to an isolated mioroparticle may be used to generate labeled 
extension products by cycle sequencing, e.g. as taught by Brenner, International 
application PCT/US95/12678. In this embodiment, primer binding site (400) is 
engineered into the reference DNA (402) distal to tag complement (406), as shown in 
Figure 4a. After isolating a microparticle, e.g. by sorting into separate microtiter well, 

20 or the like, the differentially expressed strands are melted off, primer (404) is added, 
and a conventional Sanger sequencing reaction is carried out so that labeled extension 
products are formed. These products are then separated by electrophoresis, or like 
techniques, for sequence detenninatioiL In a similar embodiment, sequencing 
templates may be produced without sorting individual microparticles. Primer binding 

25 sites (400) and (420) may be used to generate templates by PGR using primers (404) 
and (422), The resulting amplicons containing the t^plates are then cloned into a 
conventional sequencing vector, such as Ml 3. After transfection, hosts are plated and 
individual clones are selected for sequencing. 

In another embodimait, illustrated in Figure 4b, primer binding site (412) may 

30 be engineered into the competitively hybridized strands (410). This site need not have 
a complementary strand in the reference DNA (402). After sorting, competitively 
hybridized strands (410) are melted off of reference DNA (402) and amplified, e.g. by 
PCR, using primers (414) and (416), which may be labeled and/or derivatized with 
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biotin for easier manipulation. The melted and amplified strands are then cloned into 
a conventional sequencing vector, such as M13, which is used to transfect a host 
which, in turn, is plated. Individual colonies are picked for sequencing. 



Example 1 

a Tagged cDNA T.ihrarv. Sampling, and 
<\ rrPN^« ^^^^ MiCToparticles 



In this example, a preferred protocol for preparing tagged reference DN A for 
loading onto microparticles is described. Briefly, cDNA from each of the cell or 

1 0 tissue types of interest is prepared and directionally cloned into a vector containing 
the tag element of Formula L Preferably, the mKNA extracted from such cells or 
tissues is combined, usually in equal proportions, prior to first strand synthesis. 
mRNA is obtained using standard protocols, after which first and second strand 
synthesis is carried out as exemplified and the resulting cDNAs are inserted into a 

1 5 vector contaming a tag element of Formula I, or like tag element The vectors 
containing the tag-cDNA conjugates are then used to transform a suitable host, 
typically a conventional bacterial host, after which a sample of cells fix^m the host 
culture is fiuther expanded and vector DNA is extracted. The tag-cDNA conjugates 
are preferably amplified from the vectors by PGR and processed as described below 

20 for loading onto microparticles derivatized with tag complements. After the non- 
covalently attached strand is melted off, the cDNA-containii^ microparticles are 
ready to accept competitively hybridized gene products in accordance with the 
invention. Specific guidance relating to the indicated steps is available in Sambrook 
et al. (cited above); Ausbel et al., editors. Current Protocoh; in Molecular Biology 

25 (John Wiley & Sons, New York, 1995); and like guides on molecular biology 
techniques. 

A pellet of approximately 5 \ig of mRNA is resuspended in 45 \xl (final 
volume) of a first strand pre-mix consisting of 10 ^1 5x Superscript buffer (250 mM 
Tris-Cl, pH 8.3, 375 mM KCl, and 15 mM MgCh) (GIBCO/BRL) (or like reverse 
30 transcriptase buffer), 5 jil 0.1 M dithiotiireitol (DTT), 2.5 ^il 3dNTP/methyl-dCTP 
ma (10 mM each of dATP, dGTP, dTTP, and 5-methyl-dCTP, e.g available from 
Pharmacia Biotech), 1 ^1 RNasin, 12 jil 0.25 pg/jil of reverse transcription primer 
shown below, and 14.5 |il H20. 
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S'-biotin-GACATGCTGCATTGAGACGATTCTTTrri 1 1 1 1 1 1 ITTTTTV 
Reverse Transcription Primer (SEQ ID NO: 2) 

5 

After incubation for 15 min at room temperature, 5 ml of 200 U/jil Superscript is 
added and the mixture is incubated for 1 hr at 42°C. After the 1 hr incubation, the 
above mixture (about 50 \i\ total) is added to a second-strand premix on ice (volume 
336 Hi) consisting of 80 ]il 5x second-strand buffer (94 mM Tris-Cl, pH 6.9, 453 mM 

10 KCl, 23 mM Mga2,-andJ0.mM (NH4)2S04 to. give a total reaction volume of about 
386 \il Separately, 4 jtl of 0.8 U/fil RNase H (3.2 units) and 10 jil of 10 unit/jil E. 
coli DNA polymerase I (100 units) are combined and tiie combined enzyme mixture 
is added to tiie above second-stiand reaction mixture, after which tiie total reaction 
volume is miciofuged 5 sec and tiien incubated for 1 hr at 16»C and for 1 hr at room 

15 temperature to give the following double stranded cDNA (SEQ ID NO: 3): 

5'-biotin-GRCfilQCTS3OT!GRGRa3RTT^^ ••• »»ICXXX-3' 

^^jjjjj^Qgy2t3i3MOaS3^^ • • - XCERGXXX-S' 

t t 

20 BsmBI 

where tiie X's indicated nucleotides in tiie cDNAs, V represents A, C, or G, and B 
represents C, G, or T. Note tiwt tiie reverae transcription primer sequence has been 
selected to give a Bsm BI site in ttie cDNAs which results in a 5'-GCAT overhang 

25 upon digestion with Bsm BL 

After phenol/chloroform extraction and etiianol precipitation, tiie cDNA is 
resuspended in tiie manufacturer's recommended bufBer for digestion witii Dpn n 
(New England Biolabs, Bevercly, MA), which is followed by capture of tiie 
biotinylated fragment on avidinated beads (Dynal, Oslo, Norway). After washing, tiie 

30 captured fragments are digested witii Bsm BI to release tiie foUowing cDNAs (SEQ 
ID NO: 4) which are precipitated in ethanol: 

GCATTGftGACGRTTCTTT rmTi ' T TT' i l' i iTl V XXX ... X -3 ' 
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ACTCTGCT AflGAAARAAARAAAARAAAftABXXX . . . XCTAO -5 ' 

A conventional cloning vector, such as BlueScript H, pBC, or the like (Strat^ene 
Cloning Systems, La Jolla, CA), is engineered to have the foUowing sequence of 
elanents (SEQ ID NO: 5)(which are those shown in Formula I): 



10 



15 



5'-. . .TTAATTAAGGA [TAG] GGGCCCGCATAAGTCTTC [STUFPER] 
GGATCC. . .-3' 

3 • - . . - AATTAATTC CT [TAG] CCCGGGCGTATTCAGAAG [STUFPER] CICTAGG. • • -5 ' 
t t t 

Pac I Bba I Bam HI 



After digestion with Bbs I and Bam HI, the vector is purified by gel electrophoresis 
and combmed with the cDNAs for Ugation. Note that the vector has been engineered 
so that the Bbs I digestion results in an end compatible with the Bsm Bl-digested end 
20 of the cDNAs. After ligation, a suitable host bacteria is transforaied and a culture is 

expanded for subsequent use. 

From the expanded culture, a sample of host cells are plated to detennine the 
fraction that cany vectors wife inserted cDNAs, after whidi an aliquot of culture 
corresponding to about 1 .7 x 10* iMert-containing cells is withdrawn and sq)arBtely 
25 expanded in culture. This rqjresents about one percent of fee repertoire of tags of the 

type illustrated in Fonnula I. 

Preferably, fee tag-cDNA conjugates are amphfied out of fee vectors by ?CR 
using a conventional protocol, such as fee following. For each of 8 replicate PCRs, 
fee following reaction components are combined: 1 jil vector DNA (125 ng/\il for a 
30 Ubrary, lO' copies for a single clone); 10 \il lOx Klentaq Buffer (Clontech 

Laboratories, Palo Alto, CA); 025 nl biotinylated 20.mer "forward" PCR primer (1 
nmol/jU); 0.25 ^1 FAM-labeled 20-mer "reverse" PCR primer (1 nmoynl); 1 nl 25 
mM dATP, dGTP, dTTP, and 5-mefeyl-dCTP (total dNTP concraitration 100 mJVl); 5 
Hl DMSO; 2 fil 50x Klentaq enzyme; and 80.5 ^il H2O (for a total volume of 100 nl). 
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The PCR is run in an MJR DNA Engine (MJ Research), or like thermal cycler, with 
the following protocol: I) for 4 min; 2) 94°C 30 sec; 3) 6TC 3 min; 4) 8 cycles 
of steps 2 and 3; 5) 94X 30 sec, 6) 64^C 3 min, 7) 22 cycles of steps 5 and 6; 8) 6TC 
for 3 min; and 9) hold at 4^C. 

5 The 8 PCR mixtures are pooled and 700 ^l phenol is added at room 

temperature, after which the combined mixture is vortexed for 20-30 sec and flien 
centrifuged at high speed {e.g. 14,000 ipm in an Eppendorf bench top centrifuge, or 
like instrument) for 3 min. The supernatant is removed and combined with 700 ^1 
chloroform (24:1 mixture of chlorofonn:iso-amyl alcohol) in a new tube, vortexed for 

10 - 20-30 sec, and centrifuged for Lmin, after which the supernatant is transferred to a 
new tube and combined with 80 ^l 3M sodium acetate and 580 ^l isopropanol. After 
centrifiiging for 20 min, the supernatant is removed and 1 ml 70% ethanol is added. 
The mixture is centrifuged for 5-10 min, after which the ethanol is removed and the 
precipitated DNA is dried in a speedvac. 

1 5 After resuspension, the cDNA is purified on avidinated magnetic beads 

(Dynal) using the manufacturer's recommended protocol and digested with Pac I (1 
unit of enzyme per jig of DNA), also using the manufacturer's recommended protocol 
(New England Biolabs, Beverly, MA). The cleaved DNA is extracted with 
phenol/chloroform followed by ethanol precipitation. The tags of the tag-cDNA 

20 conjugates are rendered single stranded by combining 2 units of T4 DNA polymerase 
(New England Biolabs) per ng of streptavidin-purified DNA. 150 jig of streptavidin- 
purified DNA is resuspended m 200 (il H2O and combined with the following reaction 
components: 30 pil 10 NEB Buffer No. 2 (New England Biolabs); 9 jil 100 mM 
dGTP; 30 |il T4 DNA polymerase (10 units/jil); and 31 fil H2O; to give a final 

25 reaction volume of 300 |aL After incubation for 1 hr at 37''C, the reaction is stopped 
by adding 20 ^il 0.5 M EDTA, and the T4 DNA polymerase is inactivated by 
incubating the reaction mixture for 20 min at 75^*0. The tag-cDNA conjugates are 
purified by phenol/chloroform extraction and ethanol precipitation. 

5 |mi GMA beads with tag complements are prepared by combinatorial 

30 synthesis on an automated DNA synthesizer (Gene AssCTibler Special /4 Primers, 

Pharmacia Biotech, Bjorkgatan, Sweden, or Uke mstrument) using conventional 

phosphoramidite chemistry, wherein nucleotides are condensed in the 3'->5' direction. 

In a preferred embodiment, a 28-nucleotide "spacer** sequence is synthesized, 
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followed by the tag complement sequence (8 **words" of 4 nucleotides each for a total 
of 32 nucleotides in the tag complement), and a sequence of three Cs. Thus, the 
beads are derivatized with a 63-mer oligonucleotide. The length of the "spacer" 
sequence is not critical; however, the proximity of the bead surface may affect the 
5 activity of enzymes that are use to treat tag complements or captured sequences. 

Therefore, if such processing is employed, a spacer long enough to avoid such surface 
effects is desirable. Preferably, the spacer is between 10 and 30 nucleotides, 
inclusive. The following sequence (SEQ ID NO: 6), containing a Pac I site, is 
employed in the present embodiment: 

10 .- 

5 » -CCC- [Tag Complement] ■Tg CTTAATTAAC TGQTCTCACTGTCGCA-bead 

t 

PacI 

1 5 Preferably, the tag-cDNA conjugates are hybridized to tag compliments on 

beads of a number corresponding to at least a full repertoire of tag complements, 
which in the case of the present embodiment is 8*, or about 1.6 x 10^ beads. The 
number of beads in a given volume is readily estimated with a hemocytometer. 
Prior to hybridization of the tag-cDNA conjugates, the 5* ends of the tag 

20 complements are phosphorylated, preferably by treatment with a polynucleotide 
kinase. Briefly, 2.5 x 10* beads suspended in 100 ^il H2O are combined with 100 jil 
lOx NEB buffer No. 2 (New England Biolabs, Beverly, MA), 10 \i\ 100 mM ATP, 1 
\il 10% Tween 20, 17 jil T4 polynucleotide kinase (10 units/jil), and 772 \il H2O for a 
final volume of 1000 \il After incubating for 2 hr at 37°C with vortexing, the 

25 temperature is increased to 65®C for 20 min to inactivate the kinase, with continued 
vortexing. After incubation, the beads are washed twice by spinning down the beads 
and resuspending them in 1 ml TE (Sambrook et al.. Molecular Cloning, Second 
Edition, Cold Spring Harbor Laboratory) containing .01% Tween 20. 

For hybridization of tag-cDNA conjugates to tag complements, the tag-cDNA 

30 conjugates as prepared above are suspended in 50 ^il H2O and the resulting mixture is 
combined with 40 ^1 2.5x hybridization buffer, after which the combined mixture is 
filtered through a Spin-X spin column (0.22 jxm) using a conventional protocol to 
give a filtrate containing the tag-cDNA conjugates. (5 ml of the 2.5x hybridization 
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buffer consists of 1.25 ml 0.1 M NaP04 (pH 7.2), 1.25 ml 5 M NaCl, 0.25 ml 0.5% 
Tween 20, 1.50 ml 25% dextran sulfate, and 0.75 ml H2O.) Approximately 1.8 x 10^ 
beads in 10 jil TE/Tween buffer (TE with .01% Tween 20) is centrifuged so that the 
beads form a pellet and the TE/Tween is removed. To the beads, 25 jil of Ix 
5 hybridization buffer (10 mM NaP04 (pH 7.2), 500 mM NaCl, 0.01% Tween 20, 3% 
dextran sulfate) is added and the mixture is vortexed to fully resuspend the beads, 
after which the mixture is centrifuged so that the beads form a pellet and the 

supernatant is removed. 

The tag-cDNA conjugates in the above filtrate are incubated at 75**C for 3 min 
10 and combined with the beads, after which the mixture is vortexed to fiilly resuspend 
the beads. The resulting mixture is further incubated at 75*^0 with vortexing for 
approximately three days (60 hours). After hybridization, the mixture is centrifuged 
for 2 min and the supernatant is removed, after which the beads are washed twice with 
500 ^il TE/Tween and resuspended in 500 M'I Ix NEB buffer No. 2 with .01% Tween 
15 20. The beads are incubated at 64*^0 in this solution for 30 min, after which the 

mixture is centrifuged so that the beads form a pellet, the supematant is removed, and 
the beads are resuspended in 500 \il TE/Tween. 

Loaded beads are sorted from unloaded beads using a high speed cell sorter, 
preferably a MoFlo flow cytometer equipped with an argon ion laser operating at 488 
20 nm (Cytomation, Inc., Ft CoUins, CO), or like instrument. After sortmg, the loaded 
beads are subjected to a fill-in reaction by combining them with the following 
reaction components: 10 ^l lOx NEB buffer No. 2, 0.4 jil 25 mM dNTPs, 1 \il 1% 
Tween 20, 2 ^il T4 DNA polymerase (10 units/ml), and 86.6 ^1 H2O, for a final 
reaction volume of 100 jil. After incubation at 12^C for 30 min with vortexing, the 
25 reaction mixture is centrifuged so that the beads form a pellet and the supematant is 
removed. The pelleted beads are resuspended in a Ugation buffer consisting of 15 m,1 
lOx NEB buffer No. 2, 1.5 \x\ 1% Tween 20, 1.5 jil 100 mM ATP, 1 jil T4 DNA 
ligase (400 units/ ml), and 131 ^1 H2O, to give a fmal volume of 150 The ligation 
reaction mixture is incubated at 37*^0 for 1 hr with vortexing, after which the beads 
30 are pelleted and washed once with Ix phosphate buffered saline (PBS) with 1 mM 
CaCla. The beads are resuspended in 45 jil PBS (with 1 mM CaCl2) and combined 
with 6 jil Pronase solution (10 mg/ml, Boehiinger Mannheim, Indianapohs, IN), after 

which the mixture is incubated at 37^C for 1 hr with vortexing. After centrifiigation, 
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the loaded beads are washed twice with TE/Tween and then once with Ix NEB Dpn n 
buffer (New England Biolabs, Beverly, MA). 

The tag-cDNA conjugates loaded onto beads are cleaved with Dpn II to 
produce a four-nucleotide protruding strand to which a complemraitaiy adaptat 

5 carrying a 3'-Iabel is Ugated. Accordingly, the loaded beads are added to a reaction 
mixture consistmg of the foUowing componMits: 10 ^l lOx Nffi Dpn H buffer. 1 ^1 
1% Tween, 4 ^l Dpn n (50 units/ml), and 85 \i\ H2O, to give a final reaction volume 
of 100 ill. The mixture is incubated at 3TC overnight with vortexing, after which the 
beads are pelleted, the supernatant is removed, and the beads are washed once with Ix 

10 NEB-buffer No. 3. To prevent self-ligation, the protruding strands of the tag-cDNA 
conjugates are treated with a phosphatase, e.g. calf intestine phosphatase (CIP), to 
remove the 5* phosphates. Accordingly, the loaded beads are added to a reaction 
mixture consisting of the following components: 10 jil lOx NEB buffer No. 3, 1 \il 
1% Tween 20, 5 »il CIP (10 units/fil), and 84 ^1 H2O, to give a final reaction volume 

15 of 100 nl. The resulting mixture is incubated at 37*C for 1 hr with vortexing, after 
which the beads are pelleted, washed once in PBS containing 1 mM CaCU, treated 
with Pronase as described above, washed twice with TE/Tween, and once with Ix 

NEB buffer No. 2. 

The following 3'-labeled adaptor (SEQ ID NO: 7) is prepared usmg 

20 conventional reagents, e.g. Clontech Laboratories (Palo Alto, CA): 

5 ' -pGATCACGAGCTGCCAGTC-FAM 
TGCTCGACGGTCAG 

25 whoe '"p" is a 5* phosphate group and "FAM" is a fluorescein dye attached to the 3' 
carbon of the last nucleotide of the top strand by a commercially available 3' linker 
groiq) (Clontech Laboratories). The ligation is carried out in the following reaction 
mixture: 5 \A lOx NEB buffer No. 2, 0.5 \il 1% Tween 20, 0.5 ^l 100 mM ATP, 5 ml 
3'-labeled adaptor (100 pmol/^1), 2.5 \il T4 DNA ligase (400 units/nl) and 36.5 \i\ 

30 H2O, to give a fmal reaction volume of 50 ^1. The reaction mixture is incubated at 
16"*C overnight with vortexing, after which the beads are washed once with PBS 
containing 1 mM CaClj and treated with Pronase as described above. After this initial 
ligation, the nick remaining between the adaptor and tag-cDNA conjugate is sealed by 
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simultaneously treating with both a kinase and a ligase as follows. Loaded beads are 
resuspended in a reaction mixture consisting of tiie following components: 15 jiOx 
NEB buffer No. 2, 1.5 ]i\ 1% Tween 20, 1.5 \il 100 mM ATP, 2 ^1 T4 polynucleotide 
kinase (10 units/fU), 1 ]il T4 DNA Ugase (400 units/nl), and 129 ^1 H2O, for a final 

5 reaction volume of 150 The reaction mixture is incubated at 37°C for 1 hr witii 
vortexing, after whidi the beads are washed once with PBS containing 1 mM CaCh, 
treated with Pronase as described above, and washed twice with TE/Tween. 

After the labeled strand is melted oS, preferably by treatinent with 150 mM 
NaOH, the reference DNA on the beads is ready for competitive hybridization of 

1 0 differentially expressed gene products. 

Example 2 

pp^amttrtii nf a Ye ast Reference r>MA Population 
Att^M^heH to Microparticles 

15 In tilis ocample, Saccharomyces cerevisiae cells of strain YJM920 MATa 

Gal+ SUC2 CUPl are grown in sqiarate rich and minimal media cultures essentially 
as describe by Wodicka et al. (cited above). mRNA extiacted fix)m cells grown undar 
both conditions are used to establish a reference cDNA population which is tagged, 
sampled, amplified, labeled, and loaded onto nucroparticles. Loaded nricroparticles 
20 are isolated by FACS, labels are removed, and flie noiwjovalently bound strands of 
the loaded DNA are melted off and ranoved. 

Yeast cells are grown at SO'C either in rich medium consisting of YPD (yeast 
extiacfpeptone/glucose. Buffered, Newark, NJ) or in minimal medium (yeast niti»gen 
base without amino acids, plus glucose, BufEerad). Cell density is measured by 

25 counting cells ftom duplicate dilutions, and the number of viable cells per milliliter is 
estimated by plating dilutions of the cultures on YPD agar immediately before 
coUecting cells for mRNA extraction. Cells is mid-log phase (1-5 x 10 cells/ml) are 
peUeted, washed twice with AE buffer solution (50 mM NaAc, pH 5.2, 10 mM 
EDTA), fix>zen in a dry ice-ethanol bath, and stored at -80°C. 

30 mRNA is extracted as follows for both the construction of the reference DNA 

Ubrary and for preparation of DNA for competitive hybridization. Total RNA is 
extracted from frozen cell pellets using a hot phenol method, described by Schmitt et 
al., Nucleic Acids Research, 18: 3091-3092 (1990), with the addition of a chlorofoim- 
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isoamyl alcohol extraction just befor precipitation of the total KNA. Phase-Lock Gel 
(5 Prime-3 Prime, Inc., Boulder, CO) is used for all organic extractions to increase 
RNA recovery and decrease the potential for contamination of the RNA with material 
from the organic interfece. PolyCAf RNA is purified from the total RNA with an 

5 oligo-dT selection step (Oligotex, Qiagen, Chatsworth, CA). 

5 ^g each of mRNA from cells grown on rich medium and minimal medium 
are mixed for construction of a cDNA library in a pUC19 containing the tag rq)ertoire 
of Formula L The tag repertoire of Formula I is digested with Eco RI and Bam HI 
and inserted mto a similarly digested pUC19. The mRNA is reverse transcribed with 

10 a commercially available_kit_(Strategene, La Jolla,_CA). using an olgio-dT primer 

containing a sequence which generates a Bsm BI site identical to that of Formula I 
upon second strand synthesis. The resulting cDNAs are cleaved with Bsm BI and 
Dpn n and inserted into the tag-containing pUC19 after digestion with Bsm BI and 
Bam HI. After transfection and colony formation, the density of pUC19 tranformants 

15 is determined so that a sample containing approximately thirty thousand tag-cDNA 
conjugates may be obtained and expanded in culture. Alternatively, a sanq>le of tag- 
cDNA conjugates are obtained by picking approximately 30 thousand clones, which 
are Ikca mixed and expanded in culture. 

From a standard miniprep of plasmid, the tag-cDNA conjugates are amplified 

20 by PGR with 5-methyldeoxycytosine triphosphate substituted for deoxycytosine 

triphosphate. The following 19-mer forward and reverse primers (SEQ ID NO: 8 and 
SEQ ID NO: 9), specific for flanking sequences in pUC19, are used in the reaction: 



25 



forward primer : 5 ' -biotin-AGTGAATTCGGGCCTTAATTAA 



reverse primer: 5 • -FAM-GTACCCGCGGCCGCGGTCGACTCTAGAGGATC 



where "FAM" is an NHS ester of fluorescein (Clontech Laboratories, Palo Alto, CA) 
coupled to the 5' end of the reverse primer via an amino linkage, e.g. Aminolinker n 
30 (Perkin-Elmer, Applied Biosystems Division, Foster City, CA). The reverse primer is 
selected so that a Not I site is reconstituted in the double stranded product. After PGR 
amplification, the tag-cDNA conjugates are isolated on avidinated beads, e.g. M-280 
Dynabeads (Dynal, Oslo, Norway). 

.43- 



SUBSTITUTE SHEET (RULE 26) 



PCTAJS99/00666 

WO 99/35293 

After washing, the cDNAs bound to the beads are digested with Pac I 
releasing the tag-cDNA conjugates and a stripping reaction is carried out to render the 
oligonucleotide tags single stranded. After the reaction is quenched, the tag-cDNA 
conjugate is purified by phenol-chlorofoim extraction and combined with 5.5 Om 

5 GMA beads carrying tag complements, each tag complement having a 5* phosphate. 
Hybridization is conducted under stringent conditions in the presence of a thermal 
stable ligase so that only tags forming perfectly matched duplexes with their 
complements are ligated. The GMA beads are washed and the loaded beads are 
concentrated by FACS sorting, using the fluorescently labeled cDNAs to identify 

1 0 loaded GMA beads. The isolated beads are treated with Pac I to remove the 
fluorescent label, after which the beads are heated in an NaOH solution using 
conventional protocols to remove the non-covalently bound strand. After several 
washes the GMA beads are ready for competitive hybridization. 



15 Examples 

Tfinlftrion anri THentification of I Jp-Reeulated ai 



Ofnfffi ^^^^^ Rxposed to Different Growth Conditions 
In this example, mRNA is extracted from cells of each culture and two 
populations of labeled polynucleotides are produced by a single round of poly(dT) 

20 primer extension by a reverse transcriptase in the presence of fluorescentiy label 
nucleoside triphosphates. Equal amounts of each of the labeled polynucleotides are 
tiien combined witii the GMA beads of Example 1 carrying the reference DNA 
population for competitive hybridization, after which the beads are analyzed by FACS 
and tiiose in the oflF-diagonal regions are accumulated for MPSS analysis. 

25 Fluorescent nucleoside triphosphates Cy3-dUTP or CY5-dUTP (Amersham) 

are incorporated into cDNAs during reverse transcription of 1. ng of poly(A)^ RNA 
obtained as described in Example 1 using a poly(dT)i6 primer in separate reactions. 
After heating the primer and RNA to TO^'C for 10 min, the reaction mixture is 
transferred to ice and a premixed solution, consisting of 200 U Superscript 11 (Gibco), 

30 buffer, deoxyribonucleoside triphosphates, and fluorescent nucleoside triphosphates 
are added to give the following concentrations: 500 ^iM for dATP, dCTP, and dGTP; 
200 jiM for dTTP; and 100 mM each for Cy3-dUTP or CY5-dUTP. After incubation 
at 42°C for 2 hours, unincorporated fluorescent nucleotides are removed by first 
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diluting the reaction mixture with 470 jil of 10 mM tris-HCl (pH 8.0)/l mM EDTA 
and then subsequenUy concentrating to about 5 ^il using a Centricon-30 concentrator 
(Amicon). Purified labeled cDNA from both reactions is combined and resuspended 
in 1 1 Ml of 3.5 X SSC containing 10 jtg poly (dA) and 0.3 nl of 10% SDS. Prior to 
5 hybridization the solution is boiled for 2 min and allowed to cool to room 

temperature, after which it is appUed to the GMA beads and incubated for about 8-12 
hours at 62»C. After washing twice in 2 x SSC and 0.2% SDS, the GMA beads are 
resuspended in NEB-2 buffer (New England Biolabs, Beverly, MA) and loaded in a 
Coulter EPICS EUte ESP flow cytometer fiw analysis and sorting. In a two 
10 dimensional fluorescence intensity contour plot, the GMA beads generate a pattern as 
shown in Figure la Sortmg parameters are set as shown in Figure lb so that GMA 
beads in the off-diagonal regions (1 12) are sorted and collected for MPSS analysis. 

The labeled cDNA strands are melted fix>m the GMA beads and removed by 
centrifiigation. After several washes, a primer is annealed to the primer binding site 
15 shown in Formula I and extended in a conventional polymerization reaction to 

reconstitute the double stranded DNAs on the GMA beads which include the Dpn H 
site, described above. After digestion with Dpn II, beads loaded with tag-cDNA 
conjugates are placed in an instrument for MPSS analysis, as described in Albrecht et 
al. (cited above). 

20 The top strands of the following 16 sets of 64 encoded adaptors (SEQ ID NO: 

10 Arough SEQ ID NO: 25) are each separately synthesized on an automated DNA 
synthesizer (model 392 AppUed Biosystems, Foster City) using standard methods. 
The bottom strand, which is the same for all adaptors, is synthesized separately then 
hybridized to the respective top strands: 



25 



SEQ ID NO. Encoded Adaptor 



10 5 ' -pANNNTACAGCTGCATCCCttggcgctgagg 

pATGCACGCGTAGGG-5 ' 

11 5 • -pNANNTACAGCTGCATCCCtgggcctgtaag 

pATGCACGCGTAGGG- 5 ' 

12 5 ' -pCNNNTACAGCTGCATCCCttgacgggtctc 

pATGCACGCGTAGGG- 5 ' 

13 5 ' -pNCNNTACAGCTGCATCCCtgcccgcacagt 

pATGCACGCGTAGGG- 5 ' 

14 5 ' -pGNNNTACRGCTGCATCCCttcgcctcggac 
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pATGCACGCGTAGGG-5 ' 



3^5 5 1 -pNGNNTACAGCTGCATCCCtgatccgctagc 

pATGCACGCGTAGGG-5 ' 

♦ 

, g 5 1 -pTNNNTACAGCTGCATCCCttccgaacccgc 

pATGCACGCGTAGGG-5 ' 

,7 5 1 -pNTNNTACAGCTGCATCCCtgagggggatag 

pATGCACGCGTAGGG-5 ' 

, Q 5 « -pNNANTACAGCTGCATCCCttcccgctacac 

pATGCACGCGTAGGG-5 ' 

,3 5 i -pNNNATACAGCTGCATCCCtgactCCCcgag 

pATGCACGCGTAGGG-5 ' 

20 5 • -pNNCNTACAGCTGCATCCCtgtgt tgcgcgg 

pATGCACGCXaTAGGG- 5 ' 

21 5 ' -pNiraCTACAGCTGCATCCCtctacagcagcg 

pATGCACGCXSTAGGG-S ' 

22 5 ' -pNNGNTACAGCTGCATCCCtgtcgcgtcgtt 

pATGCACGCXSTAGGG-5 ' 

23 5 ' -pNNNGTACAGCTGCATCCCtcggagcaacct 

pATGCACGCGTAGGG-5 ' 

24 5 > -pNNTNTACAGeTGCATCCCtggtgaccgtag 

pATGCACGCGTAGGG-S • 

25 5 < -pNNNTTACAGCTGCATCCCtCCCCtgtcgga 

pATGCACGCGTAGGG-S • 

Where N is any of dA. dC, dG. or dT; p is a phosphate groi^; and the nucleotides 
indicated in lower case letters are the ll-mer oUgonucleotide tags. Each tag differs 
fiom every other by 6 nucleotides. Equal molar quantities of each adaptor are 
5 combined in NEB #2 restriction buffer (New England Biolabs, Beverly. MA) to form 
a mixture at a concentration of 1 000 pmol/pL, 

Each of the 16 tag complements are separately synthesized as amino- 
derivatized oUgonucleotides and are each labeled with a fluorescein molecule (using 
an NHS-ester of fluorescein, available from Molecular Probes, Eugene, OR) iwhich is 
10 attached to the 5' end of the tag.complement through a polyethylene glycol linker 
(Clonetech Laboratories. Palo Alto. CA). The sequences of the tag complements are 
simply the 12-mer complements of the tags listed above. 

Ligation of the adaptors to the target polynucleotide is carried out in amixture 
consisting of 5 nl beads (20 mg), 3 ^L NEB 1 Ox Ugase buffer, 5 nL adaptor mix (25 
15 nM), 2.5 nL NEB T4 DNA Ugase (2000 units/nL). and 14.5 ^L distilled water. The 
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mixture is incubated at 16°C for 30 minutes, after which the beads are washed 3 times 
in TE (pH 8.0). 

After centrifugation and removal of TE, the 3* phosphates of the Ugated 
adaptors are removed by treating the polynucleotide-bead mixture with calf intestinal 
5 alkaline phosphatase (OP) (New England Biolabs, Beverly, MA), using the 
manufacturer's protocol. After removal of the 3' phosphates, the CIP may be 
inactivated by proteolytic digestion, e.g. using PronaseTM (available form Boeiinger 
Mannhiem, Indianapolis, IN), or an equivalent protease, with the manufacturer's 
protocol. The polynucleotide-bead mixture is then washed, treated with a mixture of 
10 _T4-polynucleotide.kinase jnd T4 DNA ligswe (NewEngland Biojabs, Beverly, MA) 
to add a 5' phosphate at the gap between the target polynucleotide and the adaptor, 
and to complete the Ugation of the adaptors to the target polynucleotide. The bead- 
polynucleotide mixture is then washed in TE. 

Separately, each of the labeled tag complements is applied to the 
15 polynucleotide-bead mixture under conditions which permit the fonnation of perfectly 
matched duplexes only between the oUgonncleotide tags and their respective 
complements, after which the mixture is washed under stringent conditions, and the 
presence or absence of a fluorescent signal is measured. Tag complements are 
applied in a solution consisting of 25 nM tag complement 50 mM NaCl, 3 mM Mg, 
20 10 mM Tris-HCl (pH 8.5), at 20»C, incubated for 10 minutes, tiien washed in the 
same solution (without tag conq)lement) for 10 minute at 55"C. 

After the four nucleotides are identified as described above, tiie encoded 
adaptors are cleaved ftom tiie polynucleotides witiiBbv I using the manufecturer's 
protocol. After an initial ligation and identification, tiie cycle of ligation, 
25 identification, and cleavage is repeated tiiree times to give tiie sequence of the 16 
terminal nucleotides of tiie target polynucleotide. 

Preferably, analysis of tiie hybridized encoded adaptors takes place in an 
instrument which i) constrains tiie loaded microparticles to be disposed in a planar 
array in a flow chamber, ii) permits tiie programmed deUvery of process reagents to 
30 the flow chamber, and iii) detects simultaneously optical signals from tiie array of 
microparticles. Such a preferred instrument is shown diagrammaticaUy in Figure 2, 
and more fiilly disclosed in Bridgham et al., International patent application 
PCTAJS98/1 1224. Briefly, flow chamber (500) is prepared by etching a cavity 
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having a fluid inlet (502) and outlet (504) in a glass plate (506) using standard 
micromachining techniques, e.g. Ekstrom et aL, International patent application 
PCT/SE9 1/00327; Brown, U.S. patent 4,911,782; Harrison et al.. Anal. Chem. 64: 
1926-1932 (1992); and flie like. The dimension of flow chamber (500) are such that 

5 loaded microparticles (508), e.g. GMA beads, may be disposed in cavity (5 10) in a 
closely packed planar monolayer of 100-200 thousand beads. Cavity (510) is made 
into a closed chamber with inlet and outiet by anodic bonding of a glass cover sUp 
(512) onto the etched glass plate (506), e.g. Pomerantz, U.S. patent 3,397,279. 
Reagents are metered into the flow chamber from syringe pumps (514 through 520) 

10 through valve blodc (522) controlled by a microprocessor as is commonly used on 
automated DNA and peptide synthesizers, e.g. Bridgham et al., U.S. patent 4,668,479; 
Hood et al., U.S. patent 4,252,769; Barstow et al., U.S. patent 5,203,368; Hunkapiller, 

U.S. patent 4,703,913; or the like. 

Three cycles of ligation, identification, and cleavage are carried out in flow 
15 chamber (500) to give the sequences of 12 nucleotides at the tennini of each of 
approximately 100,000 fragments. Nucleotides of the fragments are identified by 
hybridizing tag complements to the encoded adi^tors as described above. 
Specifically hybridized tag complements are detected by exciting their fluorescent 
labels with illumination beam (524) from light source (526), which may be a laser, 
20 mercury arc lamp, or the like. lUumination beam (524) passes through filter (528) and 
excites the fluorescent labels on tag complements specifically hybridized to encoded 
adaptors in flow chamber (500). Resulting fluorescence (530) is collected by 
confocal microscope (532), passed through filter (534), and directed to CCD camera 
(536), which creates an electronic inuige of the bead array for processing and analysis 
25 by workstation (538). Preferably, after each ligation and cleavage step, the cDNAs 
are treated with PronaseTM or like enzyme. Encoded adaptors and T4 DNA Ugase 
(Piomega, Madison, WI) at about 0.75 units per are passed through the flow 
chamber at a flow rate of about 1-2 jxL per minute for about 20-30 minutes at 16°C, 
after which 3' phosphates are removed from the adaptors and the cDNAs prepared for 
30 second strand ligation by passing a mixture of alkaline phosphatase (New England 
Bioscience, Beverly, MA) at 0.02 units per and T4 DNA kinase (New England 
Bioscience, Beverly, MA) at 7 units per jiL through the flow chamber at 37»C with a 
flow rate of 1-2 jiL per minute for 15-20 minutes. Ligation is accomplished by T4 
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DNA ligase (JS units per mL, Promega) through the flow chamber for 20-30 minutes. 
Tag complements at 25 nM concentration are passed through the flow chamber at a 
flow rate of 1-2 jiL per mmute for 10 minutes at 20''C, after which fluorescent labels 
carried by the tag complements are illuminated and fluorescence is collected. The tag 
complements are melted from the encoded adaptors by passing hybridization bufifer 
through the flow chamber at a flow rate of 1-2 (xL per minute at 55°C for 10 minutes. 
Encoded ad^tors are cleaved from the cDNAs by passing Bbv I (New England 
Biosciences, Beverly, MA) at 1 unit/^L at a flow rate of 1-2 ^lL per minute for 20 
minutes at 3TC. 



10 

Example 4 

FArS Ana lysis nf Microparticlefs Loaded with Different Ratios 
nf HNAs TAheled with Fluorescein and CY5 
In this example, the sensitivity of detecting different ratios of differently 
15 labeled cDNAs was tested by constructing a reference DNA population consisting of 
a single clone and then competitively hybridizing to the reference DNA population 
different ratios of complementary strands labeled with different fluorescent dyes. The 
reference DNA population consisted of a cDNA clone, designated "88.1 1," which is 
an 87-basepair fragment of an expressed gene of the human monocyte cell line THP- 
20 1 , available from the American Type Culture Collection (Rockville, Maryland) under 
accession number TIB 202. The nucleotide sequence of 88.1 1 has a high degree of 
homology to many entries in the GenBank Expressed Sequence Tag library, e.g. gb 
AA830602 (98%). The reference DNA population, which consisted of only 88.1 1 
cDNA, was prepared as described in Example 1, with the exception that a special 
25 population of microparticles was prepared in which all microparticles had Ihe same 
tag complement attached. The corresponding oligonucleotide tag was attached to the 
88. 1 1 cDNA. Thus, only monospecific populations of tags and tag complements 
were involved in the experiment After competitive hybridization, the loaded 
microparticles were analyzed on a Cytomation, Inc. (Ft Collins, CO) FACS 
30 instrument as described above. 

88.1 1 cDNA was also cloned into a vector identical to that of Example 1 (330 
of Figure 3b), except that it did not contain tag 336. 10 jig of vector DNA was 
linearized by cleaving to completion with Sau 3 A, an isoschizomer of Dpn n (342 of 

-49- 



SUBSTUUTE SHEET (RULE 26) 



wo 99/35293 PCT/US99/00666 



Figure 3b), after which two 1 \ig aliquots of the purified linear DNA were taken. 
From each 1 aliquot, about 20 \ig of labeled single stranded DNA product was 
produced by repeated cycles of linear amplification using primers specific for primer 
binding site 332. In one aliquot, product was labeled by incorporation of rhodamine 

5 Rl 1 0-labeled dUTP (PE Applied Biosystems, Foster City, CA); and in the other 
aliquot, product was labeled by incorporation of CY54abeled dUTP (Amersham 
Corporation, Arlington Heights, IL). Quantities of the labeled products were 
combined to form seven 5 ^g amounts of the two products in ratios of 1:1, 2:1, 1:2, 
4:1, 1:4, 8:1, and 1:8. The 5 jig quantities of labeled product were separately 

10 - hybridized to 1.6 x 10^ microparticles (GMA beads with 88.1 1 cDNA attached) 
overnight at 65**C m 50 jil 4x SSC with 0.2% SDS, after which the reaction was 
quenched by diluting to 10 ml with ice-cold TE/Tween buffer (defined above). The 
lo^ed microparticles were centrifuged, washed by suspending in 0.5 ml Ix SSC with 
02% SDS for 15 min at SS^'C, centrifiiged, and washed again by suspending in 0.5 ml 

15 O.lx SSC with 0.2% SDS for 15 min at 55°C. After the second washing, the 

microparticles were centrifiiged and resuspended in 0.5 ml TE/Tween solution for 
FACS analysis. 

The results are shown in Figures 5a-5e, where in each Figure the vertical axis 
corresponds to CY5 fluorescence and the horizontal axis corresponds to rhodamine 

20 Rl 1 0 fluorescence. In Figure 5a, a population of microparticles were combined that 
had either all Rl 10-labeled DNA or all CY5-labeled DNA hybridized to the 
complementary reference strands. Contours 550 and 552 are clearly distinguished by 
the detection system of the FACS instrument and microparticles of both populations 
produce readily detectable signals. Figure 5b illustrates tiie case where the Rl 10- and 

25 CY5-labeled strands are hybridized in equal proportions. As expected, the resulting 
contour is located on the diagonal of the graph and corresponds to the position 
expected for non-regulated genes. Figures 5c through 5e show the analysis of three 
pairs of competitive hybridizations: i) RllO- and CY5-labeled strands hybridized in a 
2:1 concentration ratio and a 1:2 concentration ratio, ii) Rl 10- and CY5-labeled 

30 strands hybridized in a 4:1 concentration ratio and a 1 :4 concentration ratio, and iii) 
Rl 10- and CY5-labeled strands hybridized in an 8: 1 concentration ratio and a 1 :8 
concentration ratio. The data of Figure 5c suggest that genes up-regulated or down- 
regulated by a factor of two are detectable in the present embodiment, but that 
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significant overlap may exist between signals generated by regulated and non- 
regulated genes. Figures 5d and 5e suggest that genes up-regulated or down-regulated 
by a fector of four or higher are readily detectable over non-regulated genes. 



Example 5 

ffftrentiallv E 



.StimP^^tRH and Ujif^jm^ilntpH THP-1 Cells 

In this example, a reference DNA population attached to microparticles was 
constructed fiom cDNA derived fiom THP-1 cells stimulated as indicated below. 
10 Equal concentrations of labeled cDNAs fix)m both stimulated and unstimulated THP-1 
cells were then competitively hybridized to the reference DNA population, as 
described in Example 1, and the microparticles carrying the labeled cDNAs were 
analyzed by a FACS instrument THP-1 cells were stimulated by treatment with 
phorbol 12-myristate 13-acetate (PMA) and lipopolysaccharide (LPS). 
15 THP-1 ceUs were grown m T-165 flasks (Costar, No. 3151) containing 50 ml 

DMEM/F 12 media (Gibco, No. 1 1320-033) supplemented with 10% fetal bovine 
serum (FBS)(Gibco, No. 26140-038), 100 units/ml penicillin, 100 ng/ml streptomycin 
(Gibco, No. 15140-122), and 0.5 nM p-mercaptoethanol (Sigma, No. M3148). 
Cultures were seeded with 1 x lO' cells/ml and grown to a maximal density of I x 10^ 
20 Doubling time of tiieceU populations in culture was about 36 hours. Cells were 
treated witii PMA as follows: Cells from a flask (about 5 x lO' cells) were 
centrifaged (Bedcman model GS-6R) at 1200 rpm for 5 minutes and resuspended in 
50 ml of fresh culttire media (without antibiotics) containing 5 jil of 1 .0 mM PMA 
(Sigma, No. P-8139) in DMSO (Gibco No. 21985-023) or 5 DMSO (for tiie 
25 unstimulated population), after which die ceUs were cultiired for 48 hours. FoUowing 
die 48 hour faicubation, media and non-adherent cells were aspirated from die 
experimental flask (i.e. containing stimulated cells) and fresh media (witiiout 
antibiotics) was added, tiie fresh media containing 10 ^l of 5 mg/ml LPS (Sigma, No. 
L4130) in phosphate buffered saline (PBS). The culhire of unstimulated cells was 
30 centtifuged (Beckman model GS-6R) at 1200 rpm for 5 minutes at 4''C so tiiat a peUet 
formed which was tiien resuspended in 50 ml of fresh growth media containing 10 \ii 
PBS. Both die cultures of stimulated and unstimulated cells were incubated at 37*C 
for four hours, after which cells were harvested as follows: Media was aspirated fiom 
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the cultures and adherent cells were washed twice with warm PBS. after which 1 0 ml 
PBS was added and the ceUs were dislodged with a cell scaper. The dislodged cells 
were coUected and their concentration was determined with a hemocytometer, after 
which they were centrifuged (Beckman model GS-6R) at 1200 rpm for 5 minutes to 
form a pellet which was used immediately for RNA extraction. 

mRNA was extracted from about 5x10* cells using a FastTrack 2.0 kit (No. 
K1593-02. Invitrogen, Inc. San Diego, CA) for isolating mRNA. The manufacturer's 
protocol was foUowed without significant alterations. A reference DNA population 
attached to microparticles was constructed fiom mRNA extracted from stimulated 
cells, as described in Example 1. Separate cDNA Ubraries were constructed from 
mRNA extracted from stimulated and unstimulated cells. The vectors used for the 
Ubraries were identical to that of Example 1. except that they did not contain 
oUgonucleotide tags (336 of Figure 3b). Following the protocol of Example 4, 
approximately 2 J jig of rhodamine RllO-labeled smgie stranded DNA was produced 
15 from the cDNA Ubrary derived fiom stimulated cells, and approximately 2.5 jig of 
CY5:iabeled single stranded DNA was produced from the cDNA Ubrary derived from 
unstimulated ceUs. The two 2.5 ng aliquots were mixed and competitively hybridized 
to the reference DNA on 9.34 x lO' microparticles. The reaction conditions and 
protocol was as described in Example 4. 

After hybridization, the microparticles were sorted by a Cytomation, Inc. 
MoFlo FACS instrument as described above. Figure 6 contains a conventional FACS 
contour plot 600 of the frequencies of microparticles with different fluorescent 
intensity values for the two fluorescent dyes. Approximately 10.000 microparticles 
corresponding to up-regulated genes (sort window 602 of Figure 6) were isolated, and 
25 approximately 12,000 microparticles corresponding to down-regulated genes (sort 
window 604 of Figure 6) were isolated. After melting off the labeled strands, as 
described above, the cDNAs carried by the microparticles were ampUfied using a 
commercial PGR cloning kit (Clontech Laboratories, Palo Alto, CA), and cloned into 
the manufacturer's recommended cloning vector. After transformation, expansion of a 
30 host culture, and plating. 87 colonies of up-reguUited cDNAs were picked and 73 
colonies of down-regulated cDNAs were picked. cDNAs carried by plasmids 
extracted from these colonies were sequenced using conventional protocols on a PE 



20 
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AppHed Biosystems model 373 automated DNA sequencer. The identified sequences 
aie listed in Tables 1 and 2. 



No. Copies 

19 
16 
15 
6 

6^ 



4_ 

2 

3 



Table 1 

TTp-R emulated Genes 

Description 

LD78/MIP-1 

TNF-inducible (T SF-6) mRNA 
GRO-Y (MIP-2P) 
GRO- fi (MI P-2a) 
act-2 



guanylate binding protein isofoim I (GBP-2) 
spennidine/spermin Nl-acetyltransferase 
adipocyte lipid-binding protein 
Fibronectin 
interleukin-8 

insulin-like growth factor binding protein 3 
interfaon-Y inducible early response gene 
type IV coUangenase 
cathe psin L 
EST 
EST 

Genomic/EST 




GenBank 
Indentifler 
HUMCKLD78 
HUMTSG6A 
HUMGR0G5 
HUMGROB 
HUMACT2A 



HUMGBPl 
HUMSPERM NA 
HUMAL BP 
HSFIBl 
HSMDNCF 



HSIGFBP3M 
HSINFGER 
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Table 2 

Dnwn-Regu lated Genes 



No. Copies 



16 
4 



Description 



Elongation factor 1 

Ribosomal protein S3a/v-fos tranf. Effector 



Ribosomal protein S7 

Translationally controlled tumor protein 
23 kP highly basic protein 

Laminin receptor 

Cytoskeletal gamma-actin 
Ribosomal protein L6 
Ribosomal protein LIO 
Ribosomal protein L21 



GenBank 
Indendfier 



HSEFIAC 
HUMFTEIA 



HUMRPS1 7 
HSTUMP 
HS23KDHBP 
HUMLAMR 
HSACTCGR 
HSRPL6AA 
HUMRPlOA 
HSU14967 



Ribosomal protein S27 
Ribosomal protein L5 
Ribosomal protein L9 
Ribosomal protein L17 
Ribosomal protein L30 
Ribosomal protein L38 
Ribosomal protein S8 
Ribosomal protein S iT 
Ribosomal protein S18 

Ribosomal protein S20 

Acidic ribosomal phosphoprotein PO 
26S proteasome snbunit p97 
DN A-binding protein B 

T-cell cyclophilin 

Interferon inducible 6-26 mRNA 

Hematopoetic proteoglycan core protein 
Fau " 

beta-actin 

Nuclear enc. mito. serine 

hydroxymethyltrans. 

Mito, Cytochrome c oxidase subunit n 
Genomic 



EST 



EST 



EST 
EST 
EST 
EST 
EST 
Genomic 



HUMSHMTB 



HUMMT CDK 
W92931 



W84529 



AA933890 



AA206288 
AA6497 35 
N34678 
AAl 66702 
AA630799 
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Example 6 

FACS An 



Stimulated a tirf TTnstitnulatRd THP-1 Cells 

(E:q)eiiinait: Conqi 1 1) 

A reference DNA population attached to microparticles was constructed fix)m 
cDNA derived fiom stimulated THP-1 cells. cDNA from stimulated and unstimulated 
THP-1 cells was prepared for competitive hybridization as follows. 20 ng each of the 
THP-1 unstimulated probe library (U3A-TL) and the THP-1 stimulated probe library 
10 (S3A-TL) were digested with 50 units of Sau3A to prepare the vector for linear PGR. 
The DNA was purified by phenol/chloroforai extraction and fluorescoQtly labelled by 
PGR. For calibration purposes, both CY5 and Rl 1 0 w«e used to label each 
condition. 

The U3 A-TL DNA was labeled with CY5 and the S3 A-TL DNA was labeled 
15 with Rl 10. Briefly, a reaction mixture containii^ 80 ^l lOX PGR Buffo; 16 jU 

biotinylated primer (B-Primer, 125 pmole/:l); 16 [d dNTPs (625 mM); 4 \ig template; 
1 6 ill Klentaq enzyme; 64 jil Rl 10 dUTP or 6.4 \lI of CY5 dUTP; and water to bring 
the total volume to 800 ^1. This mixture was dispensed into 8 aliquots, which then 
underwent 34 cycles of PGR according to the following protocol: 1 ) 94"'G 3 
20 min; 2) 94'G 30 sec; 3) 62»G 30 sec; 4) 72»G 1 min; and 5) 72"G 10 min. The PGR 
reaction was purified and the colored nucleotides were removed by precipitation. 
Refereiifte Population 

The Gomp 1 1 bead library consisted of 2,667,369 beads, with a complexity of 
1 mUlion clones fiom the THP-1 stimulated Ubrary. The beads were prepared as 

25 described above as ouUined in Figure 3. The starting PMT2 mean for the FTTG signal 
was 19.5. The duplexed DNA on the beads was denatured with 2.5 ml 150mM NaOH 
washes at RT for 15min with mild vortexing. The efficiency of the denaturization 
was determined by measuring the remaining FTTG signal mean, which was 2.2, i.e., 
11.3% residual fluorescoice. The beads were washed twice in .5 ml of 4X SSG .1% 

30 SDS. 

rftmprtitive Hybridization 

100,000 beads were hybridized with 10 jig of each linear PGR product of the 

stmmlated probe library (S3A-TL) labeled with GY5 and the same library labeled 

with Rl 10. 936,542 beads were hybridized with 10 ^ig of GY5 stimulated probe and 

35 10 iig of RllO unstimulated probe. The beads were assembled in 50 jil with a final 
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buffer composition of 4X SSa.1% SDS. The samples were heated to SO'C for 3 
minutes, the probes were added and the temperature was moved to 65°C. 
Hybridization continued for 16 hrs. with vortexing. The beads were ice quenched in 
10 ml of TE Twerai. The recovered samples were rinsed 2 times with IX SSC /.1% 
SDS, resuspended in .5 ml of IX SSC /.1%SDS, and washed at dS'C for 15 min. The 
beads were rinsed in .IX SSC /.1%SDS and washed at 55°C in .IX SSC /.1% SDS for 
15 min. The samples were rinsed with TE Tween and 10,000 events of both samples 
were analyzed on the BD FacsCaUbcr. 10,163 beads (1.15%), the brightest CY5 off 
the 1:1 diagonal, were sorted. 11,977 beads (1.35%), the brightest Rl 10 off the 1:1 
diagonal, were sorted. The beads were pooled in a PGR reaction, TA cloned, and 
sequenced. The identified sequences are listed in Tables 3 and 4. 

Table 3 

rnmp 1 1 ! D nwnrftyiilated fienes 









No. Copies 


Description 


Xucnuucir { 


99 : 


23 kD highly basic protein J 




1 l. 


26S proteasome subunit pS5 I 




1 1 


26S proteasome subunit p97 I 


HUM26SPSP 


1 


28kD heat shock protein j 


HSHSP28 


3 1 


90kDHSP 


HSHSP90R 


1 1 


aNAC 


HSANAC 


2 1 


g-enolase _^ 


HSAEP 


3 1 


al acid glycoprotein 


HUMAGPIA 


21 


Acidic ribosomal phosphoprotein PO 


HUMPPARPO 


4 


Acidic ribosomal phosphoprotein P 1 1 


HUMPPARPl 


3 


Acidic ribosomal phosphoprotein P2 


HUMPPARP2 


1 


activin P-C chain 


HSACTNBC 


3 


1 Adenylyl cyclase-associated protein (CAP) 


HUMADCY 


2 


ADP/ATP translocase 


HUMTLCA 


3 


Allograft-inflammatory factor 1 


HSU19713 


13 


[Antioxidant enzyme AOE37-2 


HSU25182 


1 


1 Arp2/3 protein complex subunit p2 1 Arc 


AF006084 


2 


1 Arp2/3 protein complex subunit p41 Arc 
1 ATP-dependent KNA helicase 


AF06086 


1 


AB001636 


1 


B94 " 


HUMB94 


1 7 


Ibasic transcription factor 3a (15 iFSa) 


HSBTF3 


3 


BBCl 


HSBBCl 


3 


jbeta-actin 


HSACTB 


1 1 


Ibrain-expressed HHCPA78 homolog 


S73591 


I 1 


|c-myc transcription factor puf 


HUMPUF 


3 


Icahnodulin 


HUMCAMA 


1 


IcAMP response element regulatory protein 


|hUMCREB2A 
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1 ^0* copies 1 


1^ Calf I i|iuuu 


Identifier 


1 ii 

1 ^ V 




lUMCIS 


1 u 

1 ^ 1 




fISCKSNS2 


1 L 
1 1 


r»lQ+ViTiTi accATtiKlv nTofplTI SO ] 


HSU36188 


r" 1 1 
1 1 




tlUMCLPB 


1 < 1 
1 ^ 1 


rii/7n <2nD-i ] 


HSSODRl 


1 \\ 




HUMCYCLO 


1 7 

1 ^ 1 




HSCOX7AL 


1 

1 ^ i 


nvtrtrhmme c oYidase subunit Vb 1 


HUMCOXCA 


1 1 


Pvtochrome c oxidase subunit Vic I. 


HSCOVIC 


4 

t 1 


Pvtoskeletal eanuxift-actiii 1. 


HSACTCGR 


1 ^ 1 


nvtoskeletal troDomvcin TNI 30 1. 


HSTROPCR 


A 


DNA-bindine nrotein B 1 


HUMAAE 




EBV small RNAs associated Drotein ^EAP) 1 


HSEAP 


1 1 




HSEFIAC 1 


1 1 
1 


ciongaiion laCkor i o * | 


HSEFIDELA 


i 1 

1 ^ 1 


iZriongaiioii lacior i y ] 


HSEFIGMR 


1 ^ 1 


i^Kpzo proiem i 


HSERP28 


1 ^ 1 


rEU 1 


HSFAU 1 


1 1 

1 ^ ' 


lernuii jl cnam i 


HUMFERL 


1 1 

1 ^ 


FiDroncciiii rcccpior 


HSFNRA 


r >i 
1 ^ 


rUS 


HSFUSA 


1 


Cj-p-iuce protein 


HUMMHBA123 

j|.lT_rlTIITIIi iT &AM>a^ I 


1 ^ 


Iviiutaiiiinyi uvina. synuieiasc 


HSGTS 

|XXi9VJ X M 1 


2 


|rl+ A 1 " syninase suduiui d 


|XXh^./A X X^ k9 X X^ 1 


1 ^ 


{rU.3 nistone, ciass a 


|XX\^XVXXxXkJXX>^X^ 1 


1 1^ 


|Jrle8t snocK lactor Dinauig proiem x 




r c 
\ 3 


|rieai snocK proieui oo 


HSHSP86 

|XXl>3XXk^X O V ( 


1" A 
1 ^ 


Irieniaiopoeiic uneage ecu spccmc proicui 


HSHEAM I 


1 2 


Irlcmatopoetic proieogiycaxi core proiem 


|n.ijxxx \ 


1 


|rlLA-L/K associaiea proiem 


mrcpxTApiT I 
IxlOirXXriXAi 1^ 


3 


|riMvj-l / 


|XXwlVXXXiVXVJX f 1 


1 2 


|icin cnionne cnannei reguiaiory proiem 


m^TTlTRQQ 1 

|rxOWX/077 1 


1 2 


TT Q 


1 xx^xvxxyx^ \.^x 1 


1 1 


jlMJr aenyarogenase 


1 XX W IrXXlVXX 1 


1 *i 
1 


iliunaUOii lacior 'fu 


HSINTFA4B 

1 XXhJXX^ X X *Vf»^ 1 


1 1 
1 1 


imsuunoma ng aiiaiog 


HUMIDB 


1 ^ 


[ iji lericr oil luuuuiDiw o iiuvL^i^ 


HSIFNIN4 


1 1 
1 1 


|jSJ-rV/Vvl lu 


HUMORFAIO 


1 1 
1 i 


VTA AOI/^ 


D79986 


s 

1 


KIAA0571 


ABOl 1 14B 


11 


jLactate dehydrogenase B 


HSLDHBR 


1 12 


Laminin receptor 


HUMLAMR 


7 


LD78/MIP-1 


HUMCKLD78 


1 


Leucine-rich protein 


HUM130LEU 


1 


LLRep3 


HSLLR£P3 


1 1 


low Mr GTP-binding protein (RAB32) 


|hSU71127 1 
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No. Copies 
1^ 
2 
5 

T 
2 
1 



I 



1 
2 
1^ 
1 



\ 
8 
1 



2_ 
21 

4 

8^ 
13 

4. 
27 



£ 
4_ 
4^ 
6 

T 

15 

10 

2 

1_ 

12 

1 



1 



2 
n 

8^ 

2 

19 



17 
13 



Description 

ILSTI 

MAPKAP kinase (3pK) 

MHC protein hom. to chicken B complex 

Iprotein 

iMito chondrial cytochrome c oxidase subimit^n 
iMit ochondrial phosphate carrier protein 
Mitochondrial soine hydroxymethyl 
transferase 



GenBank 
Identi fier 

IHSLSTIG 
HSU09578 

HUMMHBA123 

Ihummtc dk 

SSMPCP 
IHUMSHMTB 



iMitochondrial tRNAs 



MTTIHS 



IMitochondrial ubiquinone-binding protein |HSUBPQPC 



Mn SOD-2 

[Myeloid progenitor inhibitory factor (MPIF-1) 
iMyosin regulatory light chain 
Nuclear-encoded mito. serine 
Ihvdroxymethyltransfg^e 



|P2U nucleotide recep^ 
I Pahnitoyl-protein thioesterase 
[Phosphate carrier 



iProthymosin a 

iRibosomal protein LIO 
Iribosomal protein LI 1 
Iribosomal protein L14 
Iribosomal protein LI 7 
Iribosomal protein L18a 
Iribosomal protein L21 



ribosomal protein L23 (putative) 
Iribosomal protein L2S 
Iribosomal protein L26 
ribosomal protein L27a 
ribosomal protein L28 
ribosomal protein L29 
ribosomal protein L3~ 
Iribosomal protein L30 
ribosomal protein L30 
ribosomal protein L32 
ribosomal protein L34 



Iribosomal protein L35 
Iribosomal protein L37a 
Iribosomal protein L38 
Iribosomal protein L4 
Iribosomal protein L41 
Iribosomal protan L5 ~ 



22 Iribosomal protein L6" 



Iribosomal protein L7 
Iribosomal protein L7a 



HTOSUDI S 
HSU85767 
HSMRLCM 
HUMSHMTB 



S74902 
|hSU44772 
HSPHOSC 



pJMTHYMA 
HUMRPlO A" 
HSRPLl l 
p8773S 
HSRPL17 

Ihumribpr od 

HSU14967 



IHSL23MR 
HSU12465 
HSRP26AA 
HSU14968 
|hSU14969 
HSRPL29 
|hUMRRL3A 
HUMRPU OA 
HSRPL30 
HSRPL32 
HUMRPL34A 



HSU1246S 



HSRPL37A 
HSRPL38 
HSHRPU 
AF026844 
HSU14966 



HSRPL6AA 



HSRBPRL7A 
HUMRPL7A" 
HSU09953 



19 Iribosomal protein L9 



-58- 



SUBSnrUTE sheet (rule 2S) 



99/35293 



PCT/US99/00666 



No. Copies 1 


Description | 


ijeni>iuiK 1 
laenmier | 


25 |i 


iDOSomal protein oil N 


rloKJroll | 


5 1 


ibosomal protein S 1 3 |] 


L1G1>PQ11 1 


11 |i 


ribosomal protein SlSa |] 


LTOppQl^A 1 


S 1 


ibosomal protein S16 P 


nLUJYloKAA | 


28 1 


ribosomal protein SI 7 11 


tlUMKral / | 


35 1 


ribosomal protein S 1 8 L 


HSRPS18 


2 1 


ribosomal protein S19 | 


HUMS19Rr 


6 Iribosomal protein S20 |1 


LTT Tik imnooA 1 

riUMRPS20 


11 li 


ribosomal protein S27 1 


HSU57847 


17 ! 


ribosomal protein S28(hu homolog of yeast) 


HUMRSPT 


3 \ 


ribosomal protein S3 1 


HSHUMS3 


4 : 


ribosomal protein S3a/v-fos transf. effector 
protein 


HUMbTlilA 


6 1 


ribosomal protein S4 


JrlUMKPi>4A | 


1 


ribosomal protein S7 1 


HUMRPS7A 


20 1 


ribosomal protein SB 1 


HSRPSo 


1 1 


RNAse/angiogenin inhibitor 


iloKAl 1 


1 


small nuclear RNA U2 


HoUZD/oo 1 


5 1 


Trcell cyclophilin 




1 1 


T-cell surface glycoprotein 1 




1 


TI-227H 


HUMTI227HC 


1 


transcriptional coactivator PC4 


HSU 12979 


1 1 


translation initiation £u:tor 2 p subunit 


HXJMELrz | 


1 


translation initiation factor eIF3 p40 subunit 


HSU54559 


2 


translationally controlled tumor protein 


HSTUMP 


1 


|U1 small nuclear RNP-specific C protein 


HSUIKNPC 


2 


ubiquinol-cytochrome c oxidase smallest 
subunit 


pSSoBo 


1 


lubiquinone binding protein 




4 


Ubiquitin 


rloU^ycSOV 1 


2 


lUbiqmtin 


rlUMUoliJ 1 


7 


jubiquitin UbaS2 


urcT TO A ^^P 1 


11 


ubiquitin UbaSO 


rlolD/VoUiv 1 


8 


EST 


A A 1 AQQ^l 1 


6 


EST 


AA/D^jUO I 


3 


EST 


A1Uj331U 


2 


EST 


AAoiU lyy \ 


2 


EST 


NzoUil 1 


2 


EST 


AA843411 


2 


EST 




2 


EST 


AI034446 


2 


EST 


AI054090 


1 


EST 


AI054090 


1 


EST 


AA828574 


1 


EST 


AI087086 


1 


lEST 


AI031866 1 
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No. Copi^ 
6 

103 
T 

J. 
2 
47 
3 



Table 4 

11' Tlprefulated denes 



Description 
23 kP highly basi c prot^ 
Act-2 

activated B cell factor 1 
activating transcription factor 3 
adenylyl cyclase-associated protein (CAP) 
adipocyte lipid-binding protein 
aqus^otin 9 



GenBank 
^Identifier 
aS23KDHBP 
HUMACT2 A 
AF0601S4 
HUMATF3X 
HUMADCY 

HUMALBP 

AB008775 

HUMHOIA 



22 


B94 


HUMB94 


1 


Cathq)sin B 


HUMCATHB 


10 


Catbepsin L 


HSCATEIL 


5 


EBV-induced protein 


HSU19261 


1 


Elongation factor 1 


HSEFIAC 


46 


Fibronectin 


HSFIBl 


57 


Guanylate binding protein isoform I (GBP-2) 


HUMGBPl 


2 


IFN-Y inducible early response gene 


HSINFGER 


33 


IGF binding protein 3 


HSIGFBP3M 
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No. Copto 
1 



20 



1_ 
_! 
1_ 
34 
1_ 
95 
10 
2 
I 
17 
19 
15 
10 
8^ 

2 

2 
2 
1 



I 
1 



1 
4 



6^ 
2 
2 
_1 
1 



1 



Description 

|IL-1 receptor antagomsT 



GenBank 
Iden tifier 

IHSIIRA 



pUMILlBA 
HSMDNCy 



HSIGFBP3M 



3 


|jKA3 mRNA induced upon T cell stimtdation 


IHSU38443 1 


■ 2 


KIAA0251 


p87438 


1 3 


iMaciDphage scavenger receptor type I 


HUMRMSRl 


[ 184 


MlP-la (LD78) 


HUMCKLD78 


1 218 


MIP-2a (GRO-P) 


IHUMGROB 


1 SO 


MIP-2P (GRO-7) 


HUMGR0G5 


1 58 


MnSOD 


HSMNSOD 


1 4 


[Musculin 


AFO87036 


1 1 


|Paraplegin 


MSPARAPLE 
HUMPTGS2 


1 3 


Prostaglandin endoperoxide synthase-2 


1 1 


RANTES 


HUMRANTES 


1 1 


Reticulocalbin 


HUMRCN 

UCTTIitOAT 



iRibosomal protein L7 

iRibosomal protdn S28" 

I S permidine/spermine Nl-acetyltransferase 
[ striated muscle contraction reg. Protein 
iTNF-in ducib le (TSG-6) mRNA 

TNFa 

[T ranslation initiation factor 2P 

TRNA-Ala 
[Type IV collagenase 

psf 

[EST 

|eST, IL-l/ TNF-inducible 

Est 
tesf 



EST 

tesT 



pT 
Genomic 



iGenomic 



[Genomic 
NO MATCH 
NO MATCH 
NO MATCH 
NO MATCH" 
NO MATCH 



^lOMATCH 



|HSRBPRL7 A 
HUMRSPT 
HUMSPERM NA 

HUMID2B 
HUMTSG 6A 

HSTNFR 

|humelf2 
hscr6alat 

HUM4COL A 
AA916304 
AA873350 
AA011639 
AA346072 
AA284427 
HSEST22 2 
W88513 
AA904231 



AA767777 
AA969937 



IAA528703 
HSAC000119 
AC000403 



AC004130 
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No. Copies 


Description 


GenBank 
Identifier 


1 


NO MATCH 




1 


NO MATCH _^ 




1157 


Total sequenced (upregulated) 





10 



U»JM 



Example 7 

F ACS Analysis of nifferentiallv Expressed Genes. 
Stimulated and Unsttniulated THP-1 Cells 

(Experiment: Comp 14) 

In a separate experiment, reference DNA population preparation and 
competitive hybridization were done as described in Example 6. 9150 beads (0.89%), 
the brightest CY5 ofifthe 1:1 diagonal, were sorted. 11085 beads (1.15%). the 
brightest RllO off the 1:1 diagonal, were sorted. The identified sequences are listed 
in Tables 5 and 6. 



Tables 



No. Copies 




n 

12 



12 



2 
9 



6^ 
6 

5 



2 

5 



_5 
5 



_5 

5 



2 

5 



" !^aminin receptor homolog mRNA 

■isapiens mRNA for libosomal protein L26 



iuman ribosomal L5 protein mRNA 



iuman mRNA for elongation factor 1 -alpha 

Rsa piens mRNA for large subunit of ribosomal protein LZl 



H.sapicns gene for ribosomal protein L38 



Description 



Homo sapiens cDNA, 3' end /clone^IMAGE 
H. sapiens rpS8 gene for ribosomal protein S8 
Human ribosomal protein L3 mRNA 
Human Ki nuclear autoantigen mRNA 



Human ribosomal protein L7a naRNA 



Novel 

Human mRNA for ribosomal protein SI 1 



Neuroblastoma RAS viral (v-ras) oncogene homolog 
Human mitochondrial DNA 



H>sapiens initiation factor 4B cDNA 

Human endothelial-monocyte activating polypeptide II mRNA 



Novel 



Human monocytic leukaemia zinc finger protein (MOZ) mRNA 



Human platelet activating factor acetylhydrolase, brain isoform, 45 kDa 
subun it (LISl) gene 
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No. Copies 1 


Description | 


4 \l 


luman ferritin L chain mRNA J 


4 ft 


iuman PNA sequence from cosmid cN37F10 on chromosome 22ql 1 ^-qtCTl 


4 IHuman mRNA for core I protein 


4 |i 


^.sapiens mRNA homologous to mouse P2 1 mRNA 


4 |l 


i.sapiens mRNA for ribosomal protein L6 ^ 1 


4 p 


iuman ribosomal protein L9 pseudogaie 


4 |] 


iomo sapiens cDNA, 3' end /clone=486654 j 


4 |j 


auman MHC protein homologous to chickai B complex protein mRNA [ 


4 fi 


iuman elongation factor EF-l-alpha gene 


1 "3 P 


tl.sapiens Subl.5 mRNA 


3 ] 


Human mRNA for Apol Human (MER5(Aopl-MouseHikeprotem) 


1 3 |] 


Homo sapiens chromosome 5, PI clone 702 AlO (LBNL H56) 1 


1 3 p 


Human fiunarase precursor (FH) mRNA J 


1 3 1i 


Homo sapiens cDNA. 3' end /clone=IMAGE: 1 695780 1 


1 3 |: 


Human GST 1 -Hs mRNA for GTP-binding protein 1 


1 3 ]. 


H.saoiens mRNA for RNA polymerase II 140 kPa subunit 1 


1 3 1 


Homo sapiens ribosomal protein L30 mRNA 


1 3 t 


Human ribosomal protein S17 mRNA I 


5 


HSEST222 Homo sapiens cDNA /clone-MEC-222 /gb=X84721 /gi=673398 1 




/ug=Hsa 15716 Aen=558 


3 


Homo saniens An)2/3 oiotcin complex subunit p21-Arc (ARC21) mRNA | 


1 3 1 


Himian cytoplasmic dynein light chain 1 (hdlcl) mRNA 1 


3 1 


Human ribosomal protein S3a mRNA ^ 


1 3 1 


Human mRNA for heat shock protein hsp86 


1 2 


KAtnn sapiens Muncl3 mRNA 


1 2 


Human translational initiation factor 2 beta subunit (elF-2-beta) mRNA 1 


2 


Human mRNA for potential laminin-binding protein 


1 2 


Human cyclophilin-related processed pseudogene 1 


2 


Homo saoiens ribosomal protein S20 (RPS20) mRNA " 1 


1 2 


iHuman acidic ribosomal phosphoprotein P 1 mRNA 


1 2 


Human ribosomal protein S13 (RPS13) mRNA | 


2 ~ 


[Novel ^ 


2 


iHomo sapiens cDNA /clone=IMAGE:979232 " 


1 2 


Homo sapiens cDNA, 3' end /clone=81477 


1 2 " 


IHuman intercellular adhesion molecule-1 (ICAM-1) mRNA 1 


2 


[Human mRNA for ribosomal protein Li7 | 


2 


IHuman mRNA for caiboxyl methyltransferase 


1 2 


IHuman mRNA for cytoskeletal gamma-actin 1 


1 2 


iHomo sapiens cDNA, 3' end /clone=62663S _J 


2 


iHuman nucleophosmin mRNA 


1 2 


[Human ribosomal protein LIQ mRna 1 


1 2 


|Novel 


1 2 


Y box binding protein-1 (YB-1) mRNA 


1 2 


[Human guanylate bindmg protein isofomi I (GBP-2) mRNA 


2 


Homo sapiens cyclin D3 (CCND3) mRNA 


2 


[Novel 


1 2 


[Homo sapiens cDNA, 3' end/clone«IMAGE:1474218 
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No. Copies 

T 

2 



Description 

Homo sapiens Ubiquitin mRNA sequence 

Hu man ribosomal protein L7 

Human phosphotyrosine independent ligand p62 for the Lck SH2 domain 

mRNA 



HSC3EF102 Homo sq)iens cDNA, 3' md /clone 
Total singlets: 189 ~~ 



Total contigs: 72 

Total seq reads in contigs: 306 
Total seqs to be searched: 610 



Table 6 

rnmpU! Upregulated Genes 





Oeccrintioii 1 


77 J 


tiuman miviN a lor puioiivc cyiumiig yrw^^L) \ 


31 


timnan gene lor lumur xicwiums lawiui ai^a cupuo^ ■ 


27 . 


[xuman insuiin~iiK.v gruwui i.awt\/i*uiimm^ pxwvwm qv>*a%» i 


26 


tiuman uu i o aipua gcuc — — — ^— — J 




Friim;)n mRNA for macronha&e inflammatory protein-2beta (MIP2beta) 




Unman ^vtAlriTlA T .T^7R CTfinC 1 


20 


Tj c!«mi«»ne crf»nA fnr cnpTTniHine/<inenTiine Nl-acetvltransf erase 1 


20 


Human gene for melanoma growth stimulatory activity (MGS A) [ 


17 


Human ferritin H cham mRNA ^ 


13 


Novel 


13 


niimnn flHipncyte Hpid-bindine orotein 


12 


Human interleukin 8 (IL8) gene 1 


9 


Homo sapiens cDNA. 3' end /clone=73864 | 


8 


Human ATL-derived PMA-responsive (APR) peptide mRNA 1 


8 


Human ATL-derived PMA-responsive (APR) peptide mRNA 


7 


Human cell surface glycoprotein CD44 mRNA 1 


7 


H.sapiens SOD-2 gene 1 


7 


Human hypoxanthine phosphoribosyltransferase (HPRT) gene 1 


6 


Human tumor necrosis factor-inducible (TSG-e) mRNA fragment j 


6 


Human adenosine receptor (A2) gene ^ 

Human phosphatidylinositol 3-kinase catalytic subunit pi lOdelta Mma 


5 


4 


Human BAG clone RGl 04104 no function 1 


~~- 4 


Homo sapiens adenosine triphosphatase mRNA 


4 


Human mRNA (3'-fragment) for (2'-5') ohgo A synthetase E 


4 


Genomic sequence no function ^ ^ 


2 


" Human type IV collagenase mRNA ^ 


2 


" Human ribosomal protein S17 mRNA 


2 


" Homo sapiens cDNA, 3' end /clone=IMAGE:14S95S3 


2 


" Human interleukin 1 -beta (ILIB) gene . 
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fA^f'R Analysis of DifFerentiallv Expressed Genes from 

Stimulated and Unstimulated THP-1 Cells 

(Experiment: Comp IS) 

5 In a separate experiment, cDNA from stimulated and unstimulated THP-1 

cells was prepared for competitive hybridization as described in Example 6. The 
reference DNA population was prepared as described in Example 6, except that the 
Comp IS bead library consisted of 2,S70»000 beads, with a complexity of 1 million 
clones from the THP-1 stimulated library and the THP-1 unstimulated library (S0% of 

10 each). 13,988 beads (.87%), the brightest CY5 off the 1:1 diagonal, were sorted. 
17,393 beads (1.08%), the brightest Rl 10 off the 1:1 diagonal, were sorted. The 
identified sequences are listed in Tables 7 and 8. 
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Table 7 

rnmp 1 H: DftwnrRpiilated flemes 



No. Copies ] 

25 ] 


description 

^. sapiens mRNA for 23 kP highly basic protem I 


17 i 


■lomo sapiens ribosomal protein L30 Mma 1 


16 i 


a.sapiens mRNA for ribosomal protein S 1 8 


1 « " 

13 J 


hT <!flnii*n« mRNA for ribosomal nrotein L6 


14 |] 


L44-like ribosomal protein (I>44L) and FTP3 (FTP3) genes 1 


i i h 


Homo saoiens PYRIN fMEFV) mRNA ~~J 


^ r 


Hfuman cathensin G mRNA 1 


8 [ 


Human mRNA for ribosomal protein S 1 1 1 


8 "f 


Novel 1 


8 f 


H.sapiens mRNA for ribosomal protein L37a | 


8 1 


Novel J 


8 1 
8 


H.sapiens mRNA for ribosomal protein L26 

H.sapiens mRNA for translationally controlled tumor protein p21 

Homology 


8 i 

8 


Human deoxyuridine triphosphatase (PUT) Mma 
Human growth factor independence- 1 (Gfi-1) mRNA 


7 


Human mRNA for ribosomal protein L39 . 1 


7 [Human ribosomal protein LlO mRNA 

6 iHuman ribosomal protein L9 mRNA, complete cds. 5/96 1 


g Homo sapiens cDNA, 3' end /clone=IMAGE: 1862607 /clone_end=3' 

/gb=AI0S3436 /ug"Hs.l3S3SS /len=138 1 


6 IHuman gene for catalase Weak Homology 


5 


|H,sapiens mRNA for ribosomal protein L7 


5 


Inomo sapiens (clone cori-lcl5) S29 ribosomal protein mRNA 1 


5 


Inuman mRNA for HBpl5/L22 


S |H.sapiens mRNA for NEFA protein 


5 iNovel 


5 [Human mRNA for potential laminin-binding protein 


5 HSEST2^ 


4 
4 


[Human ribosomal protein S16 mRNA 

Homo sapiens cDNA /clone=IMAGE: 11 18473 /gb=AA603101 
|/gi=2436962 /ug=Hs.l4214 /len=621 


4 IH.sapiens mRNA for large subunit of ribosomal protem LZi 

4 iHuman HMGH7 gene for non-histone cbFomosomal protein HMG-17 1 


4 


[Human ribosomal protein L5 mRNA 


4 


iH^sapiens Uba80 mRNA for ubiquitin, 2/97 1 


4 


[Human interferon-inducible mRNA | 


3 


iHomo sapiens mRNA for ribosomal protein L 1 4 | 


3 


IH.sapiens rpS8 gene for ribosomal protein S8 ] 


3 


Homo sapiens monocyte/macrophage Ig-related receptor MIR-7 (MIR 1 
cl-7) mRNA 


3 


[Homo sapiens U2 snRNP auxiliary factor small subunit 1 


3 


iHomo sapiens 3-phosphoglycerate dehydrogenase mRNA 
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No> Copies pescriptioii 
3 Novel 



Homo sapiens aflatoxin aldehyde reductase AFAR mRNA 



| Homo sapiens ribosomal protein LlSamRNA 

IHomo sapiens histone H2A>F/Z variant (H2AV) mRNA 
[Human ribosomal protein L27a mRNA 

|H.sapi ens gene for ribosomal protein L38 

Homo sapiens cDNA/clone=lMAGE:108y8yo /gb-AA584384 
/gi=2368993 /ug=Hs 100437 Aen=434 
[Human ribosomal protein S17 mRNA 



iHuman cyclophilin-related processed pseudogene 
|H.sapiens MUC5B gene, rearranged DNA firagme^ 
[Homo sapiens gene for ribosomal protein L41 
iHom o sapiens glia maturation factor beta mRNA 
Novel 



[Human ribosomal protein L7a 



jHuman ribosomal protein S13 (RPS13) mRNA 

Homo sapiens cDNA /clone=IMAGE:979448 /gb=AA523303 
|/gi=2264015 /ug=Hs. 15476 /len==640 



|Human profilin mRNA 



Homo sapiens cDNA, 3' end /clone=1391l89 /clone_end-3' 

| /gb=AA78 1 1 32 /ug=Hs. 1 1 0803 Aen=6S8 

Human mRNA for mitochondrial ATP synthase (Fl-ATPase) alpha 

subunit 

[Human mRNA for cytoskeletal gamma-actin 
[Human mRNA for ribosomal protein L32 
[Rsapiens beta-sarcoglycan gene 



.Human mRNA for 26S proteasome subunit p3 1 
|H.sapiens mRNA for ribosomal protein SI 5a 



[Novel 
Novel 



Homo sapien s IgE receptor beta chain (HTm4) mRNA 



Human HuR RNA binding protein (HuR) mRNA 

[human alpha-tubulin mRNA 

|H.sapiens mRNA for elongations factor Tu-mitochondrial 
Homo sapiens cDNA, 3' end /clone=550365 /clone_end=3' 
|/gb=AA098869 /gi=1644973 /ug==Hs.l03088 /len==526 
Homo sapiens cDNA, 3' end /clone=^8402 /clone_end=3' 

|/gb=AA777S29 /ug=Hs.ll355 /len=529 

Human mRNA for proteasome subunit HsClO-ff 



[Homo sapiens RCL (Rcl) mRNA 

Homo sapiens clone DTIPIAIO mRNA, CAG 

Novel 



[Human prothymosin-alpha gene 

Total singlets: 213 

[Total contigs: 76 

[Total seq reads in contigs: 366 
iTotal seqs to be searched: 717 
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Table 8 

romp IS: Upregulated Genes 



No. Copies 


Description 1 


188 


Human gene for hunor necrosis factor (TNF-aipha) 1 


61 


Cytochrome P450 


38 


i.sapiens mRNA for uridine phosphorylase 1 


27 


HuEST 


14 


Human tumor necrosis factor-inducible (TSG^) mSNA fiagment | 


11 


iomo Sapiens Chromosone 21 clone 1 


8 


>Iovel 1 


6 


Novel 1 


6 


Novel 1 


5 


SOD-2 Gene 1 


5 


Human LD78 beta gate 


4 


Adenosine receptor A2 1 


3 


Human MitocondrialDNA | 


3 


Human soeimidine/spermine Nl-acetyltransferase (SSAT) gene t 


3 


Rsapiens mRNA for 23 kD highly basic protein 


3 


HuEST 


3 


Human tumor necrosis factor alpha inducible protein A20 mRNA 1 


2 


Human spermidine/spermine Nl-acetyltransferase (SSAT) gene 


2 


Human plasma membrane Ca2+ pumping ATPase mRNA 1 


2 


Cathepsin L 1 


2 


GR03 oncogene MIP2-beta 1 


2 


Small inducible cytokine A4 (homologous to mouse Mip-lb) ACT2 


2 


GR02 oncogene MIP2-alpha 


2 


Human LD78 alpha gene 


2 


Interleukin 8 1 








Total singlets: 91 




Total contigs: 25 1 




Total seq reads in contigs: 404 1 




Total seqs to be searched: 726 | 



5 Example 9 

Tonlarinn of Rare nenes From Stimulated THP-1 Cells 

(Experiment: Cot 3) 

In this example, rare genes are isolated firam stimulated THP-1 cells by 

collecting beads of lower relative intensity. Bead and probe libraries were 

10 constructed ftom mRNA prepared fix)m phorbol ester treated THP-1 cultured cells. 

Six bead libraries (160K complexity) were loaded twice to BP 1 1 combitagged beads. 

A total of 1 ,260,000 beads were sorted. The beads were filled in and hgated. The top 
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Strand of the beads was stripped with 2.5 ml 150 mM NaOH washes at room 
temperature for 15 minutes with mild vortexing. The beads were washed twice in 0.5 
ml of 4X SSCyO.1% SDS. 100,000 beads were hybridized overnight with 50 ng of 
CY5 labelled probe from stimulated THP-1 cells in 4X SSC/0-1% SDS at 65^. The 
recovered samples were rinsed 2 times with IX SSC/0.l% SDS, resuspended in 0.5 
ml of IX SSaO.1% SDS, and washed at 65*'C for 15 minutes. The beads were then 
rinsed in O.IX SSaO.1% SDS and washed at 55°C in O.IX SSaO.1% SDS for 15 
minutes. 98,880 clones were analyzed and sorted by flow cytometry. Sample 
CT003E contained 126 clones which barely hybridized any CY5 probe. Sample 
CT003F contained_1557_ciones that.did_npt_find enough probe to migrate to the 
diagonal. These beads contained the least frequent copies m our probe library. 50 
clones from each gate (see Figure 7) were picked for sequence analysis. The 
identified sequences are listed in Table 9. 



15 



Table 9 

^iMP-1 Rare Genes 



No. Copies 


Descripton 


GenBank 
Identifier 


CT003E 


2 


Alu primaiy transcript 


U67828 


1 


AMP deaminase 


HSAMPD3B 


1 


BBCl 


HSBBCl 


14 


CD44 


HUMCD44B 


1 


clone 23933 mRNA 


HSU79273 


7 


EST 


AA905212 


1 


EST 


AA975736 


1 


EST 


N53143 


1 


EST 


AA808221 


1 


EST 


AA826047 


1 


EST 


AA736779 


1 


EST 


AA994497 


1 


EST 


AI049999 


1 


EST (88% homology) 


AA626040 


8 


EST (contains Alu repeat) 


AA129219 


9 


EST (contains Alu repeat) 


AI085719 


1 


EST (contains Alu repeat) 


W07654 


1 


EST (Sau3A not present) 


AA553627 


3 


fenitin H chain 


HUMFERH 


1 


0.1 p 


HUMILIBA 
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f"^^ Ant AC 

no» copies 




Identifier 


i 


ICTAAOOOS pene niRNA 


EIUMKGIDD 




mito cvt oxidase subiinit I oseiidocfflie 


AF035429 


1 


N^itochondrial genome 


MfflSGENOM 


1 


NfADHiubiauinone oxidoreductase NDUF56 . 
subuiiit 


AF044959 


2 


no match 




1 


no match 




1 

1 


onlv 12 bases 




1 

i 


nnlv 19 bsicPQ 




1 
1 


onlv 1 ^ Ksicf^ 
Ulliy 1^ Doowo 




1 
1 






4 


TNFa 


HSTNFR 


7 


TNF type I recept. assoc. prot/DNAse I/HSP75 


HSU12595/D831 
95/AF043254 




type IV coUagenase 


HUM4C0LA 




Ubiquitin hydrolyzing enzyme I (UBHI) 


AF022789 




VASP gene 


HSVASP413 


CT003F 




ApolipopFotein C-II 


HSAP0C2G 




BBCl 


HSBBCl 




clone si S3 mRNA firagment 


HUMFRCC 




cytoskeletaX y actin 


HSACTCGR 




elongation factor 1 a 


HSEFIAC 




EST 


AA90S212 




EST 


AA977353 




EST 


AA135810 




EST (contains Alu repeat) 


H08741 




EST 


AA282788 




EST 


AA226660 




EST (85% homology; contains Alu, CACA 
tract) 


AA704393 




EST (86% homology; contains Alu) 


H60S33 




EST (88% homology) 


AA228701 




EST (contains Alu repeat) 


AI085719 




EST (contains Alu repeat) 


AA713891 




EST (rat) 


AI136745 




ferritin H chain 


HUMFERH 




genomic (72 bp; 88% homology) 


HSAC002082 




ICAM-1 


HUMICAMAIM 




ILl p 


HUMILIBA 




Interferon y receptor accessory factor 1 


HSU05877 




mito. cyt oxidase subunit I pseudogene 


AF035429 




no match 






no match 
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Example 10 

TfilTlatiftn of F ^ff' Human Rone Marrow 

Bead and probe libraries were constructed from commercially available 
mRNA from bone marrow. Six bead Ubiaries (160K complexity) were loaded twice 
to BP 12 combitagged beads. They fonned mixes 216, 217, 218, and 219. A total of 
3,150,000 beads were sorted. The beads were fiUed in and Ugated. The top strand of 
mix 217 was stripped offwith NaOH. The CTl bone mairow probe was lineariy 
amplified with CY5 nucleotides and then purified. 200,000 beads were hybridized 
with 5 and 50 ng of probe overnight at 65'. 180,000 clones from the 5 nG 
hybridization were interrogated and sorted. Sample CTOOl contained 996 clones 
which barely hybridized any CY5 probe. CT002 sample contained 1988 clones that 
did not find enough probe to migrate to the diagonal. These beads contained the least 
frequent copies in our probe Ubrary. 200 clones from each gate (see Figure 8) were 
picked for sequence analysis. 



Example 11 

f f^ r <i Analysis of DiffRrRntiallv Rxnrcssed Genes from 
20 }i^Vf"«^ ""^ G^H^r''"^ starved Human Muscle Tissue 

Bead and probe libraries were constructed from mRNA prepared fiwn muscle 
tissue in two states: glucose normal (basal) and glucose starved (clamp). Six bead 
Ubraries (160K complexity) from the glucose nomial state were loaded to BP 12 
combitagged beads to form mix 237. A total of 810,000 beads were sorted. The 
25 beads were filled in and ligated. The beads were digested with DpnH enzyme and 
ligated to an adapter with FTTC on the strand opposite to the covalently attached DNA 
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strand. The top strand of mix 217 was stripped off with NaOH. The CTl glucose 
normal probe (13,510,000 complexity) was linearly amplified with CY5 nucleotides 
and then purified. The CT2 glucose starred probe (7,132,000 complexity) was 
lineariy amplified with RllO nucleotides and then purified. 250,000 beads were 
5 hybridized with 5ug of each probe overnight at 65^. 230,000 clones were interrogated 
and sorted. Sample UPOOl contained 968 clones which were upregulated. Sample 
DNOOl contained 1652 clones which were down regulated. 1000 clones from each 
gate (see Figure 9) were picked for sequence analysis. The identified sequences are 
listed in Tables 10 and 1 1. 
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Table 10 

Downregula ted Genes in Starved Human Muscle 



No. Copies 



Description 



23 
27 



14 



: 3uman mRNA for slow skeletal troponin C 
iuman alkali myosin light chain 1 Mma 



iuman messenger RNA for beta-globin 
Human l ymphocytic antigen CDS9/MEM43 mR NA 
P6==cytochrome c oxidase subunit Vic homolog 



13 

To 



8 



H.sapiens mRNA homologous to mouse P21 mRNA 



Human SPARC/osteonectin mRNA 



3', mRNA sequence 
Pan troglodytes beta-2-microglobulin mRNA 



reductase 

Homo si^iens gene for ribosomal protein L41 



3 
2 



Homo sapiens ribosomal protein L30 mRNA 



IMAGE: 1388067 

ni65c01.sl NCLCGAP_Prl2 Homo sapiens cDNA clone 
IMAGE:981696 




Table 11 

TTprepilated H enes in Starved Human Muscle 



No. Copies 


Description 


4 


Human mitochondrion cytochrome b gene 


4 


Homo sapiens sarcosin mRNA 


4 


laminin receptor homolog 


3 


Rsapiens mRNA for 23 kD highly basic protein 


3 


Human EN03 mRNA for beta-enolase 


3 


alpha-tmpomyosin 


3 


alpha B-crystallin 


3 


Human mRNA for muscle phosphofructokinase 


2 


Baboon beta-myosin heavy-chain mRNA 


2 


Human mRNA 3 '-fragment for glycogen phosphorylase 


2 


Human ribosomal LS protein mRNA 


2 


H.sapiens mRNA for ribosomal protein L37a 


2 


Human cytochrome c oxidase subunit Vn (C0X8) mRNA 
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All publications and patent applications mentioned in this specification are 
herein incoiporated by reference to the same extent as if each individual publication 
5 or patent application was specifically and individually indicated to be incorporated by 
reference. 

The invention now being fiiUy described^ it will be apparent to one of ordinaiy 
skill in the art that many changes and modifications can be made thereto without 
departing firom the spirit or scope of the appended claims. 
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We claim: 



1 . A method of analyzing differential gene e3q)ression, comprising: 
providing a reference population of nucleic acid sequences attached to 
5 separate solid phase supports in clonal subpopulations; 

providing a population of polynucleotides of expressed genes from a first cell 
or tissue source and at least one population of polynucleotides of expressed genes 
from a different cell or tissue source, the polynucleotides of expressed genes from 
each source comprising a light-generating label different &om the label comprised by 

1 0 polynucleotides of any oth^source; 

competitively hybridizing the populations of polynucleotides of expressed 
genes from each source with the reference nucleic acid population to form duplexes 
between the nucleic acid sequences of the reference nucleic acid population and the 
polynucleotides of each source such that the polynucleotides are present in duplexes 
15 on each of the solid phase supports in ratios directly related to the relative expression 
of their corresponding genes in the sources; and 

detecting a relative optical signal generated by the light-generating labels of 

the duplexes attached thereto. 

20 2. The method of Claim 1 , wherein said nucleic acid sequences are DNA 

sequences. 

3. The method of Claim 2, wherein said step of providing said reference 
population further includes: 
25 forming at least one population of tag-cDNA conjugates from mRNA 

extracted from at least one of said sources and a repertoire of oligonucleotide tag; 
removing a sample of tfie tag-cDNA conjugates; and 
amplifymg the tag-cDNA conjugates of the sample. 

30 4, The method of Claim 3, wherein said populations of tag-cDNA 

conjugates are formed from mRNA extracted from each of said sources, the method 
further comprising combining said populations of tag-cDNA conjugates from each of 
said sources prior to removing said sample. 
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5. The method of Claim 4, wherein said sample is sufficiently small 
relative to said total tag-cDNA conjugates that substantially all different cDNAs have 
different oligonucleotide tags. 

5 

6. The method of Claim 5, wherein said step of providing said reference 
population further includes attaching said tag-cDNA conjugates of said sample to said 
separate solid phase supports by specifically hybridizing said oligonucleotide tags of 
said tag-cDNA conjugates to their respective complements. 

10 

7. The method of Claim 6, wherein said step of amplifying comprises 
replicating said tag-cDNA conjugates of said sample m a polymerase chain reaction. 

8. The method of Claim 6, wherein said step of amplifying comprises 
1 5 replicating said tag-cDNA conjugates of said sample by inserting said tag-cDNA 

conjugates into a cloning vector and transfecting a host cell therewith. 

9. The method of Claim 6, wherein said sample includes a number of 
oligonucleotide tags less than or equal to one percent of said oUgonucieotide tags in 

20 said repertoire. 

1 0. The method of Claim 2, wherein said reference DNA population is 
derived from said expressed genes of all of said sources being analyzed, 

25 11. The method of Claim 2, fiuther comprising sorting each solid phase 

support according to said relative optical signal. 

1 2. The method of Claim 2, wherein said different light-generating labels 
are different fluorescent labels. 

30 

13, The method of Claim 12, wherem said population of polynucleotides 
of expressed genes are populations of cDNAs. 
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14. The method of Claim 13, further comprising the steps of: 
accumulating each said solid phase support having said relative optical signal 

with a value within one or more predetermined ranges of values corresponding to a 
S difference in gene expression among said sources; and 

identifying said polynucleotides on each of said solid supports by determining 
a nucleotide sequence of a portion of each of said polynucleotides. 

1 5 . The method of Claim 14, wherein said relative optical signal is a ratio 
10 of fluorescence intensities and wherein said populations of polynucleotides are fiom 

two sources. 



15 



16. The method of Claim IS, wherein said portion of said polynucleotides 
is a sequence of at least ten nucleotides. 

1 7. The method of Claim 1 5, wherein said step of identifying includes 
simultaneous sequencing of at least ten thousand of said polynucleotides by massively 
parallel signature sequencing. 



20 18. A method of isolating polynucleotides derived from genes 

differentially expressed in a pluraUty of different cells or tissues, the method 
comprising the steps of: 

providing a reference DNA population of DNA sequences attached to sq>arate 

microparticles in clonal subpopulations; 

25 providing a population of polynucleotides derived fiom genes expressed in 

each of the plurality of different cells or tissues, each polynucleotide having a light- 
generating label capable of generating an optical signal indicative of the cells or 
tissues fiom which it is derived; 

competitively hybridizing the populations of polynucleotides of genes 

30 expressed in each of the plurality of different cells or tissues with the reference DNA 
population to form duplexes between the DNA sequences of the reference DNA 
population and polynucleotides fix>m each of the different cells or tissues such that the 
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polynucleotides are present in duplexes on each of the microparticles in ratios directly 
related to the relative expression of their corresponding genes m the different cells or 
tissues; and 

isolating polynucleotides corresponding to genes differentially expressed in 
5 the different cells or tissues by sorting microparticles in accordance with the optical 
signals generated by the populations of polynucleotides hybridized thereto. 

1 9. The method of Claim 1 8, wherein said reference DNA population is 
derived fiom genes expressed in the plurality of different cells or tissues being 
10 analyzed. 



20. The method of Claim 19, wherein said plurahty of different cells or 
tissues is two and wherein said optical signal is a fluorescent signal. 



15 21. The method of Claim 20, wherein said populations of polynucleotides 

are labeled with differmt fluorescent labels. 

22. The method of Claim 21 , whwein said populations of polynucleotides 
are populations of cDNAs. 

20 

23. The method of Claim 22, wherein said step of competitively 
hybridizing includes providing hybridization conditions which result in substantially 
all of said duplexes bemg perfectly matched duplexes. 

25 24. The method of Claim 23, wherem said step of isolating includes 

sorting said microparticles in accordance with the ratio of fluorescence mtcnsities 
generated by said populations of cDNAs hybridized thereto. 

25. The method of Claim 24, wherein said step of isolating includes 
30 sorting said microparticle with a fluorescence-activated cell sorter. 
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26. The method of Claim 25, further including the step of identifying said 
isolated cDNAs by determining a nucleotide sequence of a portion of each said 
isolated cDNA. 

5 27. A method of determining relative abundance of gene products, 

comprising: 

providing a reference DNA population of DNA sequences attached to separate 
solid phase supports in clonal subpopulations; 

providing a population of polynucleotides derived fiom genes expressed in at 
1 0 least one cell or tissue source, the polynucleotides having a ligiht-generating label; 

hybridizing the polynucleotides with the reference DNA population to form 
duplexes between the DNA sequences of the reference DNA population and the 

polynucleotides; and 

sorting each solid phase support according to the optical signal generated by 
IS the light-generating labels of the duplexes attached thereto, 

wherein relative abundance of the gene products is correlated with the relative 
level of intensity of the optical signals obtained from the duplexes, wherem a lower 
intensity is indicative of a rarer gene product 

20 28. The method of Claun 27, fiurther comprising isolating solid phase 

supports having lower relative intensities, wh^ein said isolated solid phase supports 
comprise at most about S% of the total solid phase supports provided. 

29. The method of Claim 28, wherein said isolated solid phase supports 
25 comprise at most about 0.5% of the total supports provided. 

30. A method of isolating polynucleotides according to the abundance of 
the nucleic acid sequences from which they are derived, comprising: 

providing a reference DNA population of DNA sequences attached to separate 
30 microparticles in clonal subpopulations; 
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providing a population of polynucleotides derived ftom nucleic acid sequences 
present in the cells of at least one cell or tissue source, each polynucleotide having a 
light-generating label capable of generating an optical signal; 

competitively hybridizing the population of polynucleotides with the reference 
5 DNA population to form duplexes between the DNA sequences of the reference DNA 
population and the polynucleotides, the hybridizing being conducted under conditions 
which provide a hybridization rate proportionate to the abundance of the 
polynucleotide wherem less abundant polynucleotides would remain unhybridized; 

10 sorting the polynucleotides into a hybridized population and an unhybridized 

population. 

3 1 . The method of Claim 30, wherein said polynucleotides are hybridized 
with said reference DNA population under conditions such that said unhybridized 

15 population comprises polynucleotides derived fix>m rare gene products. 

32. The method of Claim 30, wherein said polynucleotides are hybridized 
with said reference DNA population under conditions such that said unhybridized 
population is substantially enriched in polynucleotides derived &om nonrepetitive 

20 nucleic acid sequences. 



33. A composition comprising a mixture of microparticles, each 
microparticle having a population of identical single stranded nucleic acid molecules 
attached thereto, the single stranded nucleic acid molecules being different on each 

25 microparticle and comprising an oligonucleotide tag in juxtaposition with a 
polynucleotide derived from an mSNA of at least one cell or tissue source. 

34. The composition of Claim 33, wherein said nucleic acid molecules are 

DNA. 



30 



35. The composition of Claim 34, wherein said polynucleotides are 
derived from a plurality of cell or tissue sources. 
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5 



10 



36. The composition of Claim 35, wherein said mixture comprises at least 
100 different microparticles. 

37. Hie composition of Qaim 35. wherein said mixture comprises at least 
1000 different microparticles. 

38. Tbe composition of Claim 35, wherein said mixtoire comprises at least 
10^ different microparticles. 



39. The composition of Claim 35, wherein said oUgonucleotide tag is 
about 12 to about 60 nucleotides in length. 

40. The composition ofClaim 35, wherein said oUgonucleotide tag is 
15 about 18 to about 40 nucleotides in length. 

41 . The composition of Claim 35, wherein said oUgonucleotide tag is 
about 25 to about 40 nucleotides in length. 

20 42. A composition comprising a mixture of microparticles, each 

microparticle having a population of identical single stranded nucleic acid molecules 
attached thereto, the single stranded nucleic acid molecules being different on each 
microparticle and each of the different nucleic add molecules comprising a 
polynucleotide encoding a protein selected fiom the group consisting of ceU cycle 

25 proteins, signal transduction pathway proteins, oncogene gene products, tumor 
suppressors, kinases, phosphatases, transcription fijctors, growth factor receptors, 
growth factors, extraceUular matrix proteins, proteases, cytoskeletal proteins, 
membrane receptors, Rb pathway proteins, p53 pathway proteins, proteins involved 
metaboUsm, proteins involved in cellular responses to stress, cytokines, proteins 
30 involved in DNA damage and repair, and proteins involved in apoptosis. 
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43. The composition of Claim 42, wherein each of said nucleic acid 
molecules further comprises an oligonucleotide tag in juxtaposition with said 
polynucleotide and positioned between said microparticle and said polynucleotide. 

5 44. The composition of claim 43, wherem each of isaid microparticles 

comprises a set of oligonucleotide tags having a sequence different from the 
oligonucleotide tags of any other microparticle in said composition. 

45. The composition of Claim 42, wherein said polynucleotides encode 
10 kinases. 

46. The composition of Claim 42, wherein said polynucleotides encode 
cell-cycle proteins. 

1 5 47. The composition of Claun 42, wherein said polynucleotides encode 

signal transduction pathway proteins. 

48. The composition of Claim 42, wherein said polynucleotides encode 
proteins involved in apoptosis. 

20 

49. The composition of Claim 42, wherein said polynucleotides encode 
proteins involved in metaboUsm. 

50. A kit for preparing a reference population, comprising: 

25 a plurality of microparticles having oligonucleotide tag complements attached 

thereto, the oligonucleotide tag complement sequence being different on each 
microparticle. 

51. The kit of Claim 50, further comprising a plurahty of vectors 

30 comprismg a library of tags, the tags having sequences complementary to said tag 
complmimts. 
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52. The kit of Claim 5 1 , further conq)rising a population of 
polynucleotides fiom at least one cell or tissue source. 



53. The kit of Claim 52, wherein said polynucleotides are cDNAs. 

54. The kit of Claim 52, wherein said population of polynucleotides is 
contamed in a container separate from said plurality of microparticles. 



55. The kit of Claim 5 1, further comprising at least one reagent for 
10 preparing said reference population. 

56. A kit for analyzing differentially expressed genes, comprising: 

a mixture of microparticles, each microparticle having a population of 
identical single stranded nucleic acid molecules attached thereto, the single stranded 
1 5 nucleic acid molecules being different on each microparticle and conq)rising 
polynucleotide derived fiom an mRNA of at least one cell or tissue source. 



57. The kit of Claim 56, wherein each of said nucleic acid molecules 
further comprises an oligonucleotide tag in juxtaposition with said polynucleotide and 

20 positioned between said microparticle and said polynucleotide. 

58. The kit of Claim 56, further comprising printed instructions for use in 
analyzing differentially expressed genes. 

25 59. The kit of Claim 56, further comprismg a container. 

60. The kit of Claim 56, fiirther comprising a population of cDNA 
molecules fiom at least one of said cell or tissue sources. 
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* Sequence Listing 

110 > Albrecht, Glenn Brenner, Sydney DuBridge, Robert B. 
120 > solid phase selection of differentially expressed genes 

<130> 822-02 

<140> US 09/130,546 

<141> 1998-08-06 

<150> OS 09/005,222 <151> 1998-01-09 
<160> 25 

<170> Microsoft Word S.l 

<210> 1 
<211> 89 
<212> DNA 

<213> Artificial Sequence 

<220> 

<221> 

<222> 

<223> 

<400> 1 

agaattcggg ccttaattaa dddddddddd dddddddddd dddddddddd 50 
ddgggcccgc ataagtcttc nnnnnnggat ccgagtgat 89 

<210> 2 
<211> 41 
<212> DNA 

<213> Artificial Sequence 

<220> 
<221> 
<222> 
<223> 
<400> 2 

gacatgctgc attgagacga ttcttttttt tttttttttt v 41 

I. 
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<210> 3 
<211> 52 
<212> ONA 

<213> Artificial Sequence 

<220> 

<221> 

<222> 

<223> 

<400> 3 

gacatgctgc attgagacga ttcttttttt tttttttttt viuumgatcn 50 nn 

<210> 4 
<211> 37 
<212> DNA 

<213> Artificial Sequence 

<220> 
<221> 
<222> 
<223> 
<400> 4 

gcattgagac gattcttttt tttttttttt ttvimim 

<210> 5 
<211> 73 
<212» DMA 

<213> Artificial Sequence 
<220> 
<221> 
<222> 
<223> 
<400> 5 



ttaattaagg addddddddd dddddddddd dddddddddd dddgggcccg 



50 



2. 
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^ ^ 73 

cataagtctt cnxxnnzingga tec 

<210> 6 
<211> 63 
<212> ONA 

<213> Artificial Sequence 

<220> 

<22I> 

<222> 

<223> 

<400> 6 

ccchhhhhhh hhhhhhhhhh hhhhhhhhhh hhhhhtcctt aattaactgg 50 
tctcactgtc gca 

<210> 7 
<211> 18 
<212> DNA 

<213> Artificial sequence 

<220> 

<221> 

<222> 

<223> 

<400> 7 

18 

gatcacgagc tgccagtc 

<210> 8 
<211> 22 
<212> DNA 

<213> Artificial Sequence 

<220> 

<221> 

<222> 

<223> 
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<400> 8 

agtgaattcg ggccttaatt aa 
<210> 9 

<211> 32 
<212> DNA 

<213> Artificial Sequence 

<220> 

<221> 

<222> 

<223> 

<400> 9 

ctacccgcgg ccgcggtcga ctctagagga tc 

<210> 10 
<211> 30 

<212> DNA 

I 

.<213> Artificial Sequence 

<220> 
<221> 
<222> 
<223> 
<400> 10 

annntacagc tgcatccctt ggcgctgagg 

<210> 11 
<211> 30 
<212> DNA 

<213> Artificial Sequence 

<220> 
<221> 
<222> 
<223> 

4. 
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<400> 11 

nanntacagc tgcatccctg ggcctgtaag 

<210> 12 
<211> 30 
<212> DHA 

<213> Artificial Sequence 
<220> 

<221> 
<222> 
<223> 
<400> 12 

cnnntacagc tgcatccc.tt gacgggtctc 

<210> 13 
<211> 30 
<212> DNA 

.<213> Artificial Sequence 
<220> 
<221> 
<222> 
<223> 
<400> 13 

ncnntacagc tgcatccctg cccgcacagt 

<210> 14 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<220> 
<221> 
<222> 
<223> 

5. 

SUBSTITUTE SHEET (RULE 28) 



30 



30 



wo 99/35293 



PCTAJS99/00666 



<400> 14 

gnnntacagc tgcatccctt cgcctcggac 

<210> 15 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<220> 

<221> 
<222> 
<223> 
<400> 15 

ngnntacagc tgcatccctg atccgctagc 

<210> 16 
<211> 30 
<212> DNA 

.<213> Artificial Sequence 

<220> 

<221> 

<222> 

<223> 

<400> 16 

tnnntacagc tgcatccctt ccgaacccgc 

<210> 17 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<220> 

<221> 
<222> 
<223> 

6. 
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<400> 17 

ntnntacagc tgcatccctg agggggatag 

<210> 18 
<211> 30 
<212> OKA 

<213> Artificial Sequence 
<220> 

<221> 
<222> 
<223> 
<400> 18 

nnantacagc tgcatccctt cccgctacac 

<210> 19 
<211> 30 
<212> DMA 

.<213> Artificial Sequence 
<220> 

<221> 
<222> 
<223> 
<400> 19 

nnnatacagc tgcatccctg actccccgag 

<210> 20 
<211> 30 
<212> DNA 

<213> Artificial Sequence 

<220> 

<221> 

<222> 

<223> 

7. 
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